EXHIBIT A 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

Applicants: Robert James, et ah Examiner: Juliet C Switeer 

Serial No,: 10/800,322 Art Unit: 1634 

Filed: March 12, 2004 Docket: 17530 

For: NUCLEIC ACID MARKERS FOR USE Confirmation No.: 2289 

IN DETERMINING PREDISPOSITION 
TO NEOPLASM AND/OR ADENOMA 

Commissioner for Patents 
RO. Box 1450 
Alexandria, VA 223134450 

DECLARATION OF DR. SUSANNE PEDERSEN 
UNDER 37 C J J*. 81.132 

Sir; 

I> Susanne Pedersen, hereby declare as follows: 

1. I am the Chief Scientific Officer at Clinical Genomics Pty Ltd, CG (Sydney, 
Australia) and am responsible for managing CG research and development, I joined Clinical 
Genomics in February 2007 after six years as a Senior Research Scientist and Project Leader at 
Proteome Systems Ltd (now known as Tynan's Diagnostics, Sydney) where I managed 
biomarker discovery in respiratory diseases, autoimmune disorders and cancer. I am the 
inventor of several biomarker-discovery method patents, 

2, I hold a Master of Science (MSc) degree and a Doctorate Degree in Molecular 
Biology from Southern University of Denmark (Odense, Denmark) and The Royal National 
Hospital (Copenhagen, Denmark). I have authored a number of scientific publications, 
including publications relating to biomarker discovery and biomarker method development, A 
true and correct copy of my curriculum vitae is attached hereto as Exhibit 1* 



3. I have reviewed the above-identified application (the '322 application), and am 
familiar with the subject matter disclosed therein. I understand that the subject matter presently 
claimed in the 4 322 application is directed to, inter alia, a method for determining the onset of 
colorectal adenoma in a human by measuring the level of expression, in a blood, serum, stool 
or gastrointestinal tract sample, of a nucleic acid molecule that contains the nucleotide 
sequence as set forth in SEQ ID NO: 7, or a nucleic acid molecule that contains a nucleotide 
sequence complementary to SEQ ID NO: 7. 

4. I have read the Office Action dated February 26,2009 issued in the '322 
application, and I have been asked to provide comments on issues raised in the Office Action. 

Relationship between SEQ ID NO: 7 and KIAAI 199 

5> The Examiner stated in the Action that she was not able to establish a 
relationship between instant SEQ ID NO: 7 and the sequence set forth in GenBank database 
under accession No. AB033025. 

6. KIAAI 199 is the designation of the genomic sequence available in the NCBI 
GenBank database under accession number AB033025, The NCBI RefSeq annotation for the 
KIAAI 199 gene also provides one representative mRNA transcript of KIAAI 199. Based on 
this annotation and using publicly available bioinformatics tools including for example BLAST 
or ClustalW for alignment of AB033O25 and SEQ ID NO: 7, it is my determination that SEQ 
ID NO: 7 lies between the exon 1 and exon 2 of NCBI RefSeq KIAAI 199 gene locus. 
Therefore, if one were to align SEQ ID NO: 7 with the mRNA sequence of KIAAI 199 (which 
is apparently what the Examiner did), one would not find any similarity between the two 
sequences. 

7. While the NCBI RefSeq database only annotates one representative transcript 
transcribed from the KIAAI 199A gene locus, the AceView database, which is the NCBI 
BST/cDNA database also available in the art, indicates the existence of other splice variant 
forms (Le, mRNA) transcribed from the KIAAI 199A gene. In reviewing the AceView 
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database, the KIAA 1 199 splice variants "j-u" , M h-u\ "i-u 11 and "g-u" all represent 
experimentally identified cDNA clones transcribed from so-called "mtronic" sequences from 
the RefSeq annotated KIAA 1199 gene, See Exhibit 2, SEQ ID NO: 7 is simply another 
example of such transcripts . 

8, Further, the RefSeqGene annotation has various weakness, as acknowledged by 
the NCBI web site, M&Kite 

"RefSeq, a subset of NCBFs Reference Sequence (RefSeq) project, defines 
genomic sequences of well-characterized genes to be used as reference 
standards. These sequences, labeled with the keyword RefSeqGene, serve as 
a stable foundation for reporting mutations, for establishing conventions for 
numbering exons and introns, and for defining the coordinates of other 
biologically significant variation, RefSeq mRNA and protein sequences 
already support these functions, but have the obvious weakness of not 
providing explicit coordinates for flanking or intronic sequence , RefSeq 
chromosome sequences also support these functions , but have awkwardly 
large coordinate values that will change when the sequence is updated with a 
new genome build . Sequences of the RefSeqGene project will counter both of 
these drawbacks by providing gene-specific genomic sequence for each gene, 
as well as including upstream and downstream flanking regions. If 
modifications must be made to any RefSeqGene sequence, it will be versioned 
and tools will be provided to facilitate conversion of coordinates, The 
RefSeqGene sequences will also be placed on the reference chromosome, and 
current chromosome coordinates will be available because of that re- 
alignment." (Emphasis added.) 

9, The following additional experimental data in Exhibit 3 are provided to show 
that SEQ ID NO:7 is in fact a transcript of the KIAA 1 199 gene. The experiments were either 
performed by myself or under my direct supervision, and data interpretation and resulting 
reports were conducted by me, 

(i) Figure 1 illustrates the differential expression across the 29 exons of KIAA 1 199 
as designated in NCBI GenBank by measuring the hybridization of RNA extracted from colon 
specimens from 30 normal subjects and 19 subjects with either adenomas or colorectal cancer. 
All probe sets across the 29 exons with the singe exception of exon 18 show upregulation of 
RNA derived from the KIAA 1 199 gene locus. For example, RNA samples derived from exon 



3 



1 and exon 2, which flank SEQ ID NO: 7, show up-regtilation in colorectal neoplasia. In 
contrast, the flanking genes to KIAA1 199 on the {+) strand, MESDC1 and FAM108C1, were 
not differentially expressed, Figure 2. 

(ii) Figure 3 illustrates the generation of a dominant PGR product using a forward 
primer in the.3'-et!d of SEQ ID NO: 7 and a reverse primer in the S'-end of NCBI GenBank 
exon 2 using an end-point PGR based analysis with RNA extracted from coion tissue 
specimens from 2 normal subjects and 2 subjects with adenomas or cancers. These data 
support the notion that SEQ ID NO: 7 is an RNA transcript derived from the KJAA 1 199 gene 
locus. 

10, Clearly, SEQ ID NO: 7 is an integral part of the genomic sequence of 
KIAA1199, and KIAA1 199 is transcribed with the SEQ ID NO: 7 sequence forming part of the. 
roRNA, 

Sample Source 

1 1 > The Examiner also raised the issue that the l 322 application only teaches 
overexpression of SEQ ID NO: 7 in colorectal tissue biopsy samples, and experimentation and 
data for fecal and blood samples are absent. 

12. Experiments were conducted under my supervision, which demonstrate 
upregulation in the level of expression of KfAAl 199 in stool samples and in serum samples. 
Specifically, the results in Exhibit 4, Part l t show that increased levels of KIAA1 199 protein in 
stool samples were detected an indirect ELISA using a monoclonal antibody directed to 
KIAA1 199 protein. In my opinion, increased level of KIAA1199 mRNA transcripts would 
also have occurred in these stool samples as welL 

13. It is known that neoplastic colorectal epithelial cells may be exfoliated in the gut 
lumen and concomitantly the stools, where they are detectable. Leakage, escape and migration 
{sometimes involving apoptosis and anoikis) of neoplastic cells into the lymphatic or peripheral 
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blood vessel system are well documented in the literature. This phenomenon is a characteristic 
of metastasis in which neoplastic cells spread from original organ to a secondary, perhaps 
distant, organ via the vasculature. Appearance of colorectal neoplastic cells in the blood and 
detection of their cellular content (e.g., RNA, protein and DNA) has been documented in the 
literature. For example, Galamb et ah (2008) (also provided in Exhibit 4) demonstrated the 
fact that colorectal tissue RNA markers can be found in the peripheral bipod, Further, the 
results in Exhibit 4, Part 2, show that approximately 30% of adenoma patients exhibited a 
significant increase in KIAA1199 mRNA in the plasma. 

14. It is noted that the techniques employed in the experiments shown in Exhibit 4 
were all available to those skilled in the art when the '322 application was first filed. 
Therefore, it is my opinion that once an upregulated expression of a biomarker is established 
based on tissue biopsy safnple, the experimentation involved in confirming that elevated 
expression can also be detected in stool and blood samples would be routine and not excessive. 

Determination of an increase i n expression 

15. The '322 application discloses that clones 8-2d and 12-2f T to which SEQ ID NO: 
7 corresponds, were up-regulated by 50 and 45 fold, respectively, in adenoma tissue samples. 
Additional experiments were performed under my supervision, which also show that there is a 
statistically significant increase in SBQ ID NO: 7 levels in adenoma patients as compared to 
normal patients. These additional validation data were presented as Exhibit 3 in the Response 
filed in the '322 application on December 1, 2008, 

16. In Exhibit 5 attached hereto, the original discovery data (provided in the '322 
application) and the further validation data (submitted in the Response filed in the *322 
application on December 1 , 2008) are presented herein in Figures 4 and 5, respectively, in a 
graph format. It is clear from the graphs in Figures 4 and 5 that the mean level of expression of 
K3AA1199 in normal patients as opposed to adenoma patients is statistically significantly 
different. There is a statistically significant increase of expression of SEQ ID NO: 7 in each of 
the graphs. The Examiner should not be focusing on the few patients who appear above and 
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below the mean value, i.e. within the standard deviation, also referred to as the overlapping 
"whiskers" between the normal and adenoma data sets. This is equivalent to the ends of the 
bell curves shown in the previously presented bell diagram. It is the middle box which the 
Examiner should focus on since this represents the data median. The top and bottom of the box 
are the 25% and 75% interquartile range. The whiskers represent the minimum and maximum 
observations which are considered outliers. The phenotypes are differentially expressed by a t- 
test where P<0,001 , which represents a statistically significant difference as between the mean 
of the normal patients as between the 25% and 75% interquartile range and the mean of the 
adenoma patients. 

17. Fn terms of the different fold changes which ate observed in the data in the 
specification and the subsequently submitted validation data, this is due to the fact that the data 
were generated from fundamentally different technologies which exhibit differences in their 
limits of detection. Depending on what the limit of detection is of a particular technology, this 
may result in differences in the actual fold change value which one obtains. The actual 
quantified levels of a molecule being measured cannot be compared directly as between two 
different types of technology. The data which appeared in the specification as originally filed 
were derived from gene specific RT-PCR, whereas the data which were submitted together 
with the Response filed in the '322 application on December i , 2008 were generated from 
whole gene RNA nricroarrays, Although two entirely different technologies with different 
limits of detection were used, the ultimate outcome is consistent. That is, the mean level of 
expression of SEQ ID NO: 7 is increased in patients who have developed adenoma as opposed 
to patients who have not. While a variety of screening techniques are suitable for use.in 
practicing the method of the '322 application, those skilled in the art would appreciate that the 
same technique should be used for both normal and test subjects to have a meaningful 
comparison and diagnosis. 

IS, I declare further that all statements made herein of our own knowledge are true 
and that all statements made on information and belief are believed to be true; and further that , 
these statements were made with the knowledge that willful false statements and the like so 
made are punishable by fine or imprisonment; or both, under Section 1001 of Title 18 of the 
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United Stated Code, and that such willful false statements may jeopardize the validity of the 
application or any patent issuing thereon. 




EXHIBIT 1 



Dr. Susanne Pedersen 



30 Michael Street North Ryde NSW Australia 2113 



Citizenship: 
Residency; 



Denmark 
Australia 



Date of Birth: 4 in January 1972 



EDUCATION 

1996-2GQQ 



1995-1996 
1991-1995 



Doctor of Philosophy (Molecular Biology) 
Copenhagen Hospital, Denmark 

Master of Sciences (Molecular Biology) 
Copenhagen Hospital, Denmark 

Bachelor of Sciences {Cell Biology) 
Southern University of Denmark 



POSTDOCTORAL EXPERIENCE 

2003 - 2006 Proteome Systems Limited, North Rycfe NSW, 

SENIOR RESEARCH SCIENTIST - Key achievement 

• Invented and patented novel enrichment techniques for isolation of immunogenic 
disease-specific proteins. These methods are now used across several Discovery 
projects to identify bbmarkers of diagnostic and prognostic values- 

• Responsible for design of experimental strategies for identification of pathogenic 
bbmarkers for Cystic Fibrosis and Tuberculosis 

• Current TB Discovery Project leader 

• 2 years experience as a scientific adviser for a proleomic study of Bacillus subtiiis for 
identification of new drug targets. This work includes project planning, budgeting, 
research evaluation and market analysis 



Mar - Dec 2002 PicoSep A/S t Odense, Denmark 

SCIENTIFIC ADVISER 

• Scientific adviser for the commercialisation of novel surface chemistry technology 

• Reporting directly to the CEO I provided experimental strategies to evaluate 
commercialisation opportunities for the technology 

• Additional tasks involved setting up of a laboratory and training of staff 



2000 - 2002 Proteome Systems Limited, North Ryde NSW 

RESEARCH SCIENTIST 

• Optimised sample preparation for 2DE profile analysis of ovarian cancer 

• Optimised sample preparation and MS analysis for detection of low-abundant proteins 

• Developed 2DE arraying methods for 'difficult 1 proteins, for &g. membrane, acidic 
and alkaline proteins. 



EDUCATION 

1996-2000 Clinical Department of Biochemistry, Copenhagen University Hospital 

(Rigshospitntet), Denmark. 



Dr. Susanne Pedersen 



30 Michael Street NoFth Ryd« NSW Australia 2113 

PhD Molecular Biology, Thesis Title; "Translations! control of in$u!Mike 

growth factor}! mRNAs", 

Supervisor: Professor Finn Cilius Nielsen 

1 995-1 996 Clinical Department of Biochemistry, Copenhagen University Hospital 

(Rigsftospitalet), Denmark, 

Master Degree, Molecular Biology. Project title: Trsnsfattonal contrvl of IGF- 
fimRNAs 

Supervisor: Professor Finn Cilius Nielsen 



1991-1995 Molecular and Biological Department, Southern University of 

Denmark, Ottense, Denmark 

Bachelor Degree, Cell Biology, Project title; "Analysis of the cea-kii operon 
from the CoE1 ptasmid". Supervisor: Dr. Kenn Gerdes 

PATENTS 

AU 2004/2521 82 A1 Method of isolating a protein 

WO 2005/001 480 A1 METHOD OF ISOLATING A PROTEIN 



US 2007/0178541 A1 Method of isolating a protein 

US 2009/0208535 A1 Novel Methods of Diagnosis of Treatment of P. Aeruginosa Infection 

and Reagents Ther^ ; . _ 

AU 2005/2561 77 A1 Novel methods of diagnosis of treatment of P. aeruginosa infection a 
reagentethere^.. _ .,. H L 

WO 2006/000056 A1 NOVEL METHODS OF DIAGNOSIS OF TREATMENT OF P. 

AERUGINOSA INFECTION AND REAGENTS THEREFOR 

PUBLICATIONS 

• Pedersen SK . Sloane AJ, Prasad SS, Sebastian LT Lindner RA, Hsu M, Robinson M, 
Bye PT, Weinberger R and Harry JL (2005). An Immunoproteomic approach for 
identification of clinical btomarkers for monitoring disease: Application to cystic fibrosis. 
Moi C&tl Proteomics, 4 (8): 1052-1060 

» Sloane AJ, Undner RA, Prasad SS, Sebastian LT, Pedersen SK Robinson M, Sye PT, 
Nielsen DW, Harry JL {2005). Proteomic analysis of sputum from adults and children with 
cystic fibrosis and from control subjects. Am J Respir Care Med 172 (1 1): 1416-1426 

• Hunt, S.M., Thomas, M.R, Sebastian, LT, Petersen, S.K ., Harcourt, R,L. P Sloane, AJ, 
and Wilkins, M,R. (2005). Optimal Replication of the importance of Experimental Design 
for Gel-based Quantitative Proteomics. Journal oi Proteome Research. 4 (3):809-819 

w Pedersen SK t Harry JL, Sebastian L, Baker J, Trafoi MD, McCarthy JT, Manoharan A- t 
Wilkins MR, Gooley AA, Righetti PG t Packer NH, Williams KL and Herbert BR (2003), 
The unseen proteorne: mining below the tip of the Iceberg to find low abundance and 
membrane proteins, J Proteorne Res. 2 {3):303-1 1 

• Harcourt, R.L, Cole, RA, Harry, EJ, Lindner, RA, Pedersen, S.fC , Prasad, S.S„ 
Sebastian, LT M Schufz, B,L, Sloane, A,J, and Harry, J.L <2003), Proteomics: The 



Dr. Susanne Pedersen 



30 Michael Street North ftyde NSW Australia 2113 

paradigm for biomarker and drug target discovery. Journal of, Medical Technology, 47 
(11), Special Issue 'Clinical Protein Diagnostics: Direction to Proteomics\ 

• Herbert, B.R., Pedersen, S.K .. Harry, J,L, Sebastian, L t Train!, M.D., McCarthy, J.T„ 
Wilkins, M.R., Gooley, A„ Packer N,K, Williams, K., Righetti, G, and Grinyer, J. (2003). 
Mastering proteome complexity using two-dimensional gel electrophoresis. 
PharmaGenomic$$&p\ 1: 22-36 

• Herbert, B„ Galvani, ML, Hamdan* M.< MacCarthy, J., Pedersen, S M Righetti, P.G (2001). 
Reduction and alkylation of proteins in preparation of two-dimensional map analysis: why, 
when and how? Electrophoresis, 22: 2048-2057 

• Herbert, 8 M Harry, J.L, Packer, N.H M Gooley, AA, Pedersen, S.K., Williams, K (2001), 
What place for polyacrylamide in proteomfcs? TRENDS in Biotechnology, 19 (10); S3-S9 

• Pedersen SK , Christiansen J, v. O. Hansen, T, Larsen MR, and Nielsen FC (2001), 
Human IGF-II leader 2 mediates internal initiation of translation. Biochem J, 363{Pt t):37- 
44. 

• Pedersen. S , Cells JE, Nielsen J, Christiansen J and Nielsen FC (1996). Distinct 
Repression of translation by Wortmannln and Rapamycin. Ear. J. Blochem. 247:449-456, 

• Breuner A, Jensen RB, Dam M, Pedersen ft Gerdes K. (1996), The centromere-like 
pare locus of plasmid Rl Molecular Microbiology 20 (3): 581-92. 



GRANTS 

Foundation for innovative new diagnostics (FIND), UNICEF/UN DP/WORLD Bank/ WHO, 
Special programme for research and training in tropical diseases (TDR): 

Lodged 29 April 2005, Identification of in vivo expressed Mycobacterium tuberculosis 
proteins as biomarkers of infection. Pi: Pedersen SK 



LABORATORY SKILLS AND EXPERIENCE 



MALDI-TQF MS 

Post-source decay (PSD) MS 

MALDi TOF TOF 

R2 peptide purification 

MS analysis of phospho-proteins 

2DE image analysis 

Immunoproteomrcs (immuno- 

capture and immune-separation) 

Experienced in working with 

clinically bio hazard samples 

Immuno-precipitation 

2DE PAGE 

1DE PAGE 

Western, Northern and Southern 
blotting 

Sample preparation for 2DE 
analysis, such as isolation of 
membrane proteins, acicflc/alkallne 
proteins 

p! -based protein pre-fractionation 
techniques 



Coupling of various Uganda (mRNA 

and proteins) to CnBr sepharose* 

Protein extraction for Gram positive 

bacteria 

Bioinf or m alias 

Protein modeling 

Guanidatlon of tryptic peptides 

Electrophoretic mobility shift assay 

DNA purification 

DNA sequencing 

PGR and DNA Cloning 

mRNA purification 

Expression and purification of 

His/GST tagged proteins 

Cell transfection 

Confoca! laser microscopy 

Biotionylation of RNA 

UV crosslink! ng of DNA/RNA 

binding proteins 
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EXHIBIT 2 



AceView: Gene :KIAA 1199, a comprehensive annotation of human r mouse and worm genes with mRNAs or STsAceView. 



i Homo sapiens gene KIAA1199, encoding KXAA1199, 

tabu: cxhtents / ope* gdsz aii pw.gra?hs 
w summary t 

RefSeq wr-rotates one representative transcript (NM indued m AoMew vananU], but Homo sapiens cDMA sequences in GettBank, 
filtered against clon«? rGan'aryenienis, realigned m the genome and dusked in 5 minimal Eipn-rsKiunctent way by the manually 
supe'vised AceVfew progrsn^ support at feast 4 *uJkwa variants. 

Aafr'iew summary 

Expression?' According to AceView, thl& gene Is expressed at high level, 2,1 ernes the average gene in this release, The sequence of 
this gene is defined by 243 GenBank scansions from 231 cDNA dories, some from bone (seen 39 times), hip (36), brain (16), colon 
(16), lung (15), nervous tumor (13), stomach (11) and 65 other Ussues. We enno&te struck defects or features in 6 cDNA clones. 
Alternative mRNA variants and regulation: The gene contains 29 different gt-ag inbons. Transcription produces JO different 
mRNAs, 4 alternatively spliced variants and 6 unspSfced forms. There are 2 non overfapplng alternative last exons and 3 validated 
alternative potyadenylation sites (see the diagram), The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, 
presence or absence of a cassette exon, overlapping exons with different boundaries. 102 bp of this gene are antisense to spliced gene 
M£SDC2. r flApr07, raising the possibility of regulated alternate expression. 

Protein coding potential; 4 spliced and 2 vnspltced mRNAs putatlvejy encode good proteins, altogether 6 different isofomis (2 
complete, 2 COOH complete, 2 partial), some containing 3 vacuolar domain JPsofUJ; 1 of the 2 complete proteins appears to be 
secreted. The remaining A mRNA vartents (4 unspliced; 1 partial) appear not to encode good proteins, 
Function; There are 4 articles specifically referring to this gene In PubMed, Functionally, the gene has been tested for association to 
diseases (Hearing Loss; Kidney fteop&sms) and proposed to participate in a process (sensory perception of sound). Proteins ere 
expected to focalize in extracellular. A putative protein IntanMnr has been described (PLXNA2). 

Please quote: ftce^'iew a comprehensive rtWA-suppov;.^ gens and li'snso'sw 1 ^ wMv;4:j'ctor : . Genome Oology 2006, ?(Suppi T)'5i2 
T Map on chromosome IS. hnks to aimer datSbS'sas anti other npmcs f ? 

Map; This gene KIAA1199 maps on chromosome IS, at 15q24 according to Entrez Gene. In AceView, it covers 172.44 fcb, from 
788S873B to 79031 173 {NCBJ 36, March 2006), on the direct strand, 

links to: manual annotations from GMIMjS08366 r the 5NP view, gene overviews from Entrez Gene 57214, GeneCards, expression 
data from SCgene, UniGene, molecular annotations from UCSC, or our GOlO analysis. 
The previous AceView annotation is here* 

Other names: The gene Is also known as K1AA1199 or TMEM2L, LOC57214. It has been described as KIAA1199, transmembrane 
protein 2-like. 
W OOvSesA PtieNmi homoiogs in other spates f ? 

The closest mouse gene, according to KastP, is the AceView gene 953{J013L23Rik (e=4 10 A -38J. 

The closest C»elegans genes, according to BlastP, are the AceVlew/WormGenes XD426, 4B57, 4D18, which may contain 

interesting functional annotation. 

The closest A,UiaTfana gene, according to Blast?, Is the AceView gene ATiGSOSOG (e=0.27,) 
Complete gen« on genome dtegww (in true scale, with colored introns) 4 

^ r S^B^S^^ T JS}J^^^ « . 



Gene KIAA1199 5' :.:.::::::.:::.:: > 3' encoded on plus strand of chromosome IS from 7S858735 to 79031173 

II 11 



nh-u 

II !-U 



0 Ikb 



Alternative mRNAs are shown aligned from 5" to 3' on a virtual genome where mirens have been shrunk to <j minimal length, Exon size 
is proportional to length, mtron height reflects the number of cDNA clones supporting each intron. 

Mouse over the ending of each transcript gives tissues from which the supporting cDI^As were extracted. Click on any transcript to 
open the specific mRNA page, to see the exact cDNA clone support and eventual SNPs and to get details on tissues, sequences, mRNA 
and protein annotations. Details on dssue of origin for each Intron and axon is available from the intron and exons table. Good 
predicted proteins are in pink or blue, yellow proteins may be partial or unconvincing, Proteins supported by a single continuous 
GenSank accession lead to underlining the name/ending of the variant. Names not underlined result from cDNA concatenation in ihe 
coding region and should be expert mentaHy checked, 
Caption 
Sequent^ 4 

T What fs know aba,* the. gens ard Its neiahhrtrs on chromssonri;?. 1 t 



http:/ /www. ncbi, n!m,n ih , gov/ IEB/Research /Acembly/avxgi?cxdb»AceView&d b^36a&term^kiaall99&submit^Co 
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AceView; Cene:KIAA1199, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView, 



ZOOM IN Didlseas^CTOnserv^ninteradions.R^ulatlon.Prpublfcatlons (see the Legend) 

► Annotated mRNA tfafrorra I 

^ ESiSiE^rsphy; 6 Articles in PubMecf f 

To mine knov/iedge about the gene, pfease click the fee Sun'masy' or the 'Huttdtori and related genes' tab at the top of the page. 
The '(Sene Summary' page Includes all we learnt about the gene, functional annotations of neighboring genes, maps, Nnks to other 
sites and the bibliography. The 'Function and reteted gei^s' page Includes Diseases (£>), Pathways, GO annotations, conserved 
domains (C), interactions (I) reference into function, and pointers to all genes wah the same functional annotation. 
To eonws all variants, their summarized annotations, Introns and ©tons, or to access any sequence, dick the 'Alternative mRNAs 
features' tab. To see a specie n»«f« veu tane diagram, sequence and annotation, dick the variant name In the tinftNA' tab. To examine 
expusaon data from an cDNAs clustered in this gene by AceView, dick the Impression tissue'. 

If you know more about this gene, or found errors, please share your knowledge. Mercl | 0 



http:/ /www.ntbi.n lm .nih.gov/IEB 
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EXHIBIT 3 



FIGURE 1 



TRANSCRIPTIONAL ACTIVTTV ACROSS THE KIAA1I99 GEME LOCUS 
Expression Mi of «CBI HMIIN Ml - «— « 1 «*■ A ^ "*™ ^ " 
cLchip micros ^ — „ «. hyWfcabon iev* » 25 P"*- 

conn*lr«*y * ■*» * •"" * «— 8 *° """"T 

MhcMrti ins-nicaons for ta Aflynretrfc HuGene ST U> array using RNA extract from 
o*n M W ft- 30 normafe (ieft ba«). * adenomas (n«ft bars, and 21 otnri 
cancers (right bars), farther information regarding methedoiogy is arable m Hgure 3. 



FIGURE 2 
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THywSCRIPTnOPlAL ACIiVTfY OF GENES FLANKING KMA115HP GENE LOCOS 
(A) Gene activity of MESDC1 (downstream neighbor gene to KIAA1199 gene locus - located 
50,993nt downstream) and (B> FAK108C1 (upstream neighbor gene to SCI AM 199 gene locus - 
located 23,722nt upstream) were determined by measuring ievel of RNA hybridization to probesets 
223264_at and 225436_a, respectively, using Affymetrix' U133 Plus 2.0 Genechips and RNA 
extractod from colon tissue specimens from 222 normals (black), 42 subjects wift IBD (green), 29 
subjects with adenomas (blue) and 161 subjects with colorectal cancer (red). 
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Diagnostic mRNA Expression Patterns of Inflamed, Benign, 
and Malignant Colorectal Biopsy Specimen and their 
Correlation with Peripheral Blood Results 

Orsoiya Gslamb, 1 - 3 Fererrc Sipos, 1 Norbert Solymosi,* Sandor Spisak, 1 Tibor Krenacs, 2 
Kinga TothJ Zsolt Tulassay, 13 and 861a Mofnar 1 '3 

j 2ncJ Department of Matficirw end 'Tst Oepoftmcrtl of Pathology and Experimental Cancer Research, Semmtfwsfs University: 
and Molecular Medicine Rewwft Unit, Hun^rian Academy of Scions, Oudspest, Hungary 



Abstract 

Purpose: Gene expression profile (GEP)-* based classi- 
fication of colonic diseases is a new method for 
diagnostic purposes. Our eim was to develop diagnos- 
tic mRNA expression patterns that may establish the 
basis of a new molecular biological diagnostic method. 
Experimental Design: TotaJ RNA was extracted, ampli- 
fied, and biotinylated from frozen colonic biopsies 
of patients with colorectal cancer (n * 22), adenoma 
(n b 20), hyperplastic polyp (n - M) t inflammatory 
bowel disease (n » 21), and healthy normal controls 
(n s 11), as well as peripheral blood samples of 1$ 
colorectal cancer and 11 healthy patients. Genome*wide 
gene expression profiie was evaluated by HGU13£pius2 
microarray s. To identify the differentially expressed 
features, the significance analysis of mlcroarrays and, 
for classification, the prediction analysis of mlcroarrays 
were used. Expression patterns were validated by real- 
time PGR. Tissue microarray Immunohistochemistnes 
were done on tissue samples of 121 patients. 



Results: Adenoma samples could be distinguished 
from hyperplastic polyps by the expression levels of 
nine genes Including ATP*binding cassette family A, 
member 8, insulin-like growth factor 1 and glucagon 
(sensitivity, 100%; specificity, 90.91%), Between low- 
grade and high-grade dysplastic adenomas, 65 classifier 
probesets such as aquaporin 1, CXCL10, and APOD 
(9D.91/1C0) were identified; between colorectal cancer 
and adenoma, 61 classifier probesets including axin 2, 
von Wlliebrand factor, tenstn 1, and gremlin 1 (90.91/100) 
were identified. Early- and advanced-stage colorectal 
carcinomas could be distinguished using 34 discrimina- 
tory transcripts (100/66.67). 

Conclusions: Whole genomic microarray analysis using 
routine biopsy samples is suitable for the identification 
of discriminative signatures for differential diagnostic 
purposes. Our results may be the basis for new GEP- 
based diagnostic methods. {Cancer Epidemiol Bio* 
markers Prev 2008;17(10):2835~45) 



introduction 

Colorectal cancer is one of the most frequent cancers in 
the world with very high mortality. According to WHO 
data, f 945,000 new colorectal cancer cases are registered 
worldwide, and almost 492,000 colorectal cancer- related 
deaths occur every year (1). Hence, the early diagnosis, 
the discrimination between genetically and expression- 
alfy different tumors, and in view of these, the 
enhancement of therapies, become necessary. The 5-year 
survival data also emphasize the importance of an early 
diagnosis of colorectal cancer The 5-year survival rate 
is 80% to 90% m early colorectal cancer, 60% in case of 
nodal involvement, and *10% in metastatic colorectal 
cancer. 

According to the widely accepted adenorna-dysplasicV 
carcinoma sequence, most of the coforectal cancer 
develop on the basis of villous adenomas (2, 3), Recently 
published, however, was the concept of a "serrated 
neoplasia pathway" referring to a pattern of progression 
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of colorectal cancer that involves hyperplastic polyps 
and serrated adenomas (4)< The serrated pathway 
culminates in colorectal cancers with DMA microsatellite 
instability, mutation of BRAF, and extensive DNA 
methyiation (5-7), lino et al. (8) suggested that MSI-L 
hyperplastic polyps may be precursors of the subset 
(10%) of colorectal cancers showing the MSf-L pheno- 
type. 

Gene expression analysis of colon biopsies using high- 
density oligonucleotide mlcroarrays may help to detect 
such gene expression patterns that would establish the 
basis for new molecular biological diagnostic methods. 
Utilization of mRNA expression microarray data for 
diagnostic purposes has already begun. More and more 
scientific studies appear to focus on the gene expression 
background of colorectal cancer progression and metas- 
tasis development (9-18), characterization of colorectal 
cancer subtypes according to mRNA expression (12, 18, 
19), the correlation of gene expression profile with 
clinicopathologic variables (12, 16, 20, 21), and mRNA 
expression- based prognosis (22). In addition to the 
surgical and biopsy tissue samples, mRNA expression 
analysis of peripheral blood samples may also play a 
crucial role in the establishment of early molecular-based 
diagnostics and prognostics of tumorous diseases (23-27). 
The handling and the evaluation of the huge amount of 
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Table 1 . Number of patients in the different disease groups 



Group 



Biopsy samples, n » 85, original set 



Biopsy samples, n = 92, 
independent set 



Blood samples, n»3l 





Affymetrix 
mlcroarray 


Taqmah 
RT-PCR 


Tissue 
microarray 


Tissue mlcroarray 


Affymetrix mlcroarray 


Adenoma with low-grade dysplasia 


9 


6 








Adenoma with high-grade dysplasia 


11 


6 








CRC Dukes A-B 


10 


6 


2 


20 


7 


CRC Dukes C-D 


12 


A 


2 


21 


12 


Normal 


11 


5 


9 


21 


11 


Hyperplastic polyp 


It 










Ulcerative colitis 


12 


7 


10 


8 




Crohn's dlseesc 


9 




6 


16 




Un determinate IBD 








7 




Total patieni numbers 


85 


34 


29 


S3 


30 



Abbreviation: CRC, ttkweurt cancer, 



data collected by mlcroarray analyses require an exten- 
sive biasnformatfcal background Multivariate statistical 
analysis is needed for the development of automatic 
diagnostic disease classification methods. 

We have previously reported the discrimi native 
mRNA expression signatures between cotorectai cancer 
versus normal adenoma versus normal, inflammatory 
bowel disease (IBD) versus normal samples, and between 
the early and advanced stages of colorectal cancer (28), 
However, the gene expression profile- based classifica- 
tion of colonic diseases for diagnostic purposes has not 
yet been solved The results of the HGU133 Plus 2.0 
whole genomic microarrays— which were also used in 
our study— in colorectal diseases have been published 
by only five research groups {29-33), and only two of 
them used biopsy samples (29, 30). Using Affymetrix 
microarrays, high-throughput disease-specific marker 
screening can be done. Our aims in this study were to 
develop diagnostic mRNA expression patterns for the 
objective classification of inflammatory, benign, and 
mahgnant colorectal diseases, and to compare the gene 
expression background of adenomas and hyperplastic 
polyps as the possible points of origin of colorectal 
cancer. Furthermore, we analyzed the presence of certain 
local colorectal cancer markers in peripheral blood that 
had been identified while using biopsy samples. This is 
necessary for the development of blood-based, disease- 
specific diagnostic screening. 



Materials and Methods 

Patients and Samples, After the informed consent of 
untreated patients, colon biopsy samples were taken 
during endoscopic intervention and stored in RNALater 
Reagent {Qiagen, Inc.) at * 80jC. Additionally, 9 ml of 
peripheral blood samples of untreated patients were 
taken into Paxgene Blood RNA Tubes (Qiagen) before 
colonoscopy. The blood samples were also stored at 
♦ 6PJC. Altogether, 377 tissue samples (85 fresh frozen 
and 292 formalin-fixed paraffin-embedded tissue samples) 
and peripheral blood samples of 19 colorectal cancer 
and 11 healthy patients were analyzed in our study, as 
well as the blood smears of 10 healthy and 10 colorectal 
cancer patients* Total RNA was extracted, and Affyme- 
crix mlcroarray analysis was done on the biopsies of 
patients with tubulovillous/viilous adenomas (n = 20, 



11 with high-grade dyspiastic end 9 with low-grade 
dysplasia), colorectal adenocarcinoma {n = 22), hyper- 
plastic polyps (n ss 11), and healthy normal controls 
(n - 11), as well as from peripheral Wood samples of 19 
patients with colorectal cancer and 11 healthy patients. 
Fifty- two microarrays {8 normal, 15 adenoma, 15 
colorectal cancer, 14 IBD) had been hybridized earlier; 
their data files were used in a previously published study 
using different comparisons (2B) and are available in the 
Gene Expression Omnibus database (series accession 
number: GSE4183). The data sets of the newly hybridized 
63 microarrays are registered in the GSE10714 (33 micro- 
arrays from biopsy samples: 3 normal, 11 hyperplastic 
polyps, 5 adenoma, 7 colorectal cancer, 7 IBD) and in 
the GSE10715 (30 microarrays from blood samples; 
19 colorectal cancer and 11 normal) serial accession 
numbers. The diagnostic groups and the number of 
patients in each group are represented in Table 1, 
Detailed patient specification is described in Supplemen- 
tary Table SI. 

Methods 

mRNA Expression Microarray Analysis. Total RNA 
was extracted using the RNeasy Mini Kit <Qiagen) for 
biopsy samples and the Paxgene Blood RNA Kit (Qiagen) 
for peripheral blood samples according to the manufac- 
turers' instructions. The isolated peripheral blood RNA 
samples were concentrated using the GeneChip Blood 
RNA Concentration Kit (Affymetrix- Inc*)* The quantity 
and the quality of the isolated RNA were tested by 
measuring the absorbance and agarose geletectrophore* 
sis or capillary gelelectrophoresis using the 2100Bioana- 
lyzer and RNA 6000 Pico Kit (Agilent, IncJ. Biotinylated 
cRNA probes were synthesized from 5 to 8 Ag total RNA 
and fragmented using the One-Cycle Target Labeling 
and Control Kit 4 according to the Affymetrix description. 
In case of peripheral blood RNA samples, 5 Ag total RNA 
was used for cRNA probe synthesis, and during reverse 
transcription Globin Reduction PNA oligomers {Applied 
Biosystems) were applied to reduce the amount of giohin 
transcripts. Ten micrograms of each fragmented cRNA 
sample were hybridized into HGU133 Plus2,0 array 



4 littps:/ '/wv\Aw.8ftyf*$trtx 
s2_roanual.pdf 
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B 




predicted 



Figure 1 Random forest. A. Heat map of the diagnostic groups separated using the random forest classification method- The heatmap 
visualizes the expression level of genes (rows) that were selected as classifiers using the random forest supervised machine learning 
method. One can realize the difference of gene expression according to the different diagnostic groups (columns). B. Agreement plot 
for visualization of the confusion matrix of the true and the predicted classes. The agreement plot is the representation of the strength 
of agreement in the confusion matrix of the observed (true) and predicted classes. The prediction of each sample was teed on the 
classifier using the genes presented on heatmap. Black areas show the observed agreement positioned within larger rectangles 
representing the maximum possible agreement, given the marginal totals. Gray areas represent the degree of disagreement. AD, 
adenoma; CRC, colorectal cancer; IBD, inflammatory bowel disease. 



(Affymettix) at 45jC for 16 h, The slides were washed 
and stained usirtg Fluid ics Station 450 and an antibody 
amplification staining method according to the manu- 
facturer's instructions. The fluorescent signals were 
detected by a GeneChip Scanner 3000. 

Statistical Evaluation of roRNA Expression Profiles 

Preprocessing and Quality Control, Quality control 
analyses were done according to the suggestions of The 
Tumour Analysis Best Practices Working Group (34), 
Scanned image$ were inspected for artifacts, and the 
percentage of present calls (>25%) and control of the 
RNA degradation were evaluated. Based on the evalu- 
ation criteria, all biopsy measurements fulfilled the 
minimal quality requirements. The Affymetrix expres- 
sion arrays were preprocessed by gcRMA with quantMe 
normalization and median polish summarization. The 
data sets are available in the Gene Expression Omnibus 
databank for further analysts, 5 series accession numbers 
GSE4183, GSE10714, and GSE10715- 

Further Analyses. To identify differentially expressed 
features, significance analysis of microarrays was used* 
The nearest shrunken centroid method (prediction 
analysis of microarrays) was applied for sample classi- 
fication from gene expression data. For gene selection, 
the random forest classification algorithm was used (35), 



whereas the .632+ bootstrap method was applied to 
estimate the prediction error rate (36), The confusion 
matrix of the true and the predicted classes was 
visualized on agreement plots (37), The preprocessing, 
data mining, and statistical steps were done using 
R-environment with Bioconductor libraries. 

Taqrnan Real-time PGR, TaqMan real-time PGR (RT- 
PCR) was used to measure the expression of 26 selected 
genes using an Applied Biosystems Micro Fluidlc Card 
System. The selected genes belonged to the prediction 
analysis of microarrays top 200 genes in the colorectal 
cancer versus normal, adenoma versus normal, and IBD 
versus normal comparisons, and validated Taqrnan 
assays were available, The measurements were done 
using an ABi PRISM 7900HT Sequence Detection System 
as described in the product's user guide. 6 The data 
analysis was described earlier (28). For data analysis, the 
SDS 2.2 software was used. 

Tissue IWeroarray Analysis and Blood Smear Immune-cy- 
tochemistry. Cores of %mm diameter were collected from 
selected areas of formalin-fixed, paraffin-embedded 
tissue blocks made from 89 early colorectal cancer (stage 
Dukes B), 57 advanced colorectal cancer {stage Dukes C 
and D), 84 IBD (32 Crohn's disease, 40 ulcerative colitis, 
and 12 undeterminate IBD), and 62 normal colon samples 



4 http://www.ncbi. nim.nfh.gov/geo/ 6 http;//www.appliecfblosystems,com 
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Table 3. Correlation between colorectal cancer versus 
normal biopsy and peripheral blood results 



Table 3, Correlation between colorectal cancer versus 
normal biopsy and peripheral blood results (Cont'd) 



Gene symbol 



Protest ID 



Gene symbol 



Probeset ID 



Up-f egulatec 1 in CRC compared with normal in both biopsy and 
blood samples 



TPM4 
SESTD1 
TTYH3 
TIMP1 
CD44 
TM9SF4 
PIM3 
PEIO 
C6orf145 
SFXN3 
MYL9 
CD44 
CD44 
VCAN 
CD44 
VCAN 
VCAN 
TGF8I 
PLXNOt 
TKT 
VCAN 
PF4 
CD44 
IFITM3 

siooAn 

NA 

G6PD 
AP1M1 
ZC3H12A 
FSCN1 
NDE1 
1ER3 
PEA15 
PTP4A3 
1MPDM1 
PRKCOBP 
DDEFl 
ESAM 
CCDCB5& 
MGC7Q36 
IFITM2 
IFITM1 
COL18A1 
RAB31 
FIN A 
TMEM158 
CTSK 
ENC1 
ICAMt 
INTS1 
PI3 
NA 



212481 s at 
2267S3.jjt 
224674 at 
201666j3t 
2l20l4_x_at 
2121MLAJtt 
224730 at 
218472 s at 
2l2923jL.at 

201058 s at 
21091 Sj; at 

221731_x_at 
2G9835_X_at 
20462£U at 
21 1571 ju* 
20l505_at 
36S7t_at 
2Q87WjL3t 
216646«sjjI 
2G6390 x at 
15579Q5_s_at 
212203 w ^at 

22B910jat 
20227$ at 
223025 s at 

210933«Sjt 
218414 M s H at 
201631jL.at 
20Q787_s_at 

2(M1G9jat 
21301<Lal 
224786_al 
22S369_at 
204610 s_at 
227983„at 
2Q1315_K„at 
201601_K m at 
209OB2jsjst 
217762 s^at 
214752_x_3t 
21333B_at 
2O2450_s_at 
2O134O.jL.0t 
2G2638 s_at 
212212 s_at 
20369 Let 
227041^at 



Down-regufated in CRC compared with normal In both biopsy 
and blootf samples 

SIC26A2 224959 at 

NA 227682_at 

UGDH 203343_at 
Up-reguiaterf in CRC compared witn normal In biopsy* down- 
regulated in blood samples 

RANBP2 20U12_s_at 

PNAJC10 22Sl74_at 

CRKRS 225694 at 

SLC39A6 202O88L*t 

SLC39AG 21fflmjuA 

DISS Z22607_sjK 

ELK3 22l773_at 

DNAJC10 2295BBUat 

(Continued on the following page) 



RAN BPS 
118 

RANBP2 
SACS 
DMAJC10 
POT1 

GALNACT-Z 

HS2ST1 

XPOT 

Down-regulated in CRC compared 
regulated in bJood samples 
MTMB11 
ETHEl 
SULT1A3 
C9erf19 
AC3XT2L2 
SULT1A2 
FCGRT 
TRPM6 
SULT1A2 
SULT1A1 
ACADVL 
C22orf16 



211953j^at 
202859 x_at 

226922 at 

2132623* 

221 782_^t 

204354~ai 
218871_x_at 
203284^at 

212160jtf 
with normal in biopsy* up* 

205076 s_at 

204034_at 
2096O7_x_al 
225604„ 5 jat 
2265t9_s_at 
207122.x at 
218831_s_at 

240389 at 

211385,x_at 
20351 S_x_at 

200710_at 

224932_at 



of 122 patients and placed into recipient blocks. Tjssue 
sections of 5- Am thickness were cut from the blocks and 
irnmunostained using the following antibodies: rabbit 
ami-human osteopontin (1:2,000 dilution; Chemicori), 
anti-osteonectin antibody (1:1,000 dilution; Chemtcon), 
rabbit antihuman biglycan (1:200 dilution; Atlas), mouse 
anti -human collagen type IVa1 (1:300 dilution; Abeam, 
clone: COL-94), mouse anti-human vascular endothelial 
growth factor (1;2,G00 dilution; Zymed clone: VG 1), 
mouse anti-human von Willebrand factor (1:20; Dako, 
clone; F8/06), and mouse anti-human piatetet-endothelial 
cell adhesion molecule 1 (1:40; Dako, clone; jC7QA). 
Signal conversion was achieved using the EnVision+ kit 
(Dako) followed by 3.3t*diaminobenzidine~hydrogen 
peroxidase chromogen-substrate kit (Dako). Irnmunos- 
tained tissue microarray (TMA) slides were digitalized 
using a high-resolution Mirax Desk instrument (Zeiss) 
and analyzed with the MJrax TMA Module software 
(Zeiss), Protein expression was evaluated using an 
empirical scale considering intensity and occupied 
subcellular compartments of epithelial/carcinoma cells 
or lamina propria ceils. For statistical analysis, Pearson's 
m 2 test and Fiber's exact test were done. 

Blood smears of 10 healthy and 10 colorectal cancer 
patients were also irnmunostained using an anti-osteo- 
nectln antibody (1:1,000 dilution; Chemicon) and Afexa 
Fluor 488 F£abl)2 fragment of goat anti-mouse IgG. The 
total and osteonactin-positive cells in 50 fields of view 
with 30* magnification were counted in each sampia. For 
statistical analysis, a t test was done to evaluate the 
difference of osteonectin-positive/total cell number ratios 
between colorectal cancer and normal blood smears. 



Results 

Classifiers between the Ma in Diagnostic Groups. 
The minimal number of discriminatory transcripts with 
high specificity and sensitivity values was determined 
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using prediction analysis of microarrays in each com- 
parlson, Adenoma samples were distinguished from 
hyperplastic polyps by 100% sensitivity anci 90,01% 
specificity, according to the expression level of minimally 
nine genes including ATP-bindlng cassette family A, 
member 8, insulin-like growth factor 1 and glucagon, 
Sixty- one classifier probesets were Identified between 
colorectal cancer and adenoma, including ax In 2, von 
Willebrand factor, tens in 1, and gremlin 1 (sensitivity. 
90,91% and specificity, 100%), IBD and normal biopsies 
could be distinguished by 100% sensitivity and specific- 
ity using only three classifiers (REG1A, MMP3, and 
CHI3L1). According to the expression of 20 transcripts 
(such as \HOO, CXC19, CCR2, CD38, RARRES3, and 
CXCL10 transcripts), ISO and colorectal cancer samples 
could be separated by 100% sensitivity and by 95.24% 
specificity. Further details can be seen in Table 2. 



Beside pair*wise comparisons, the random forest 
classification was also done to distinguish between the 
above-mentioned diagnostic groups (Fig. 1), The esti- 
mated prediction error was 12.9%. The main diagnostic 
groups could be distinguished according to the mRNA 
expression levels of 18 genes, including cell cycle and 
cell proliferation regulatory genes (retinoic acid re- 
sponder 3, LATS large tumor suppressor homologue 2, 
mutated in colorectal cancers, WARS), COP1 apoptosis 
gene, HLA-DIV1A, APOL3 r GBP2, SLAMF8 inflammatory 
response related genes, SPARC-tike 1 calcium-binding 
extracellular matrix gene, SLC15A3 oligopeptide trans- 
porter, as well as I FN regulatory factor 1, and quaking 
homologue transcription and mRNA processing 
genes. The exact functions of several classifier genes 
(FAM26F. SAMD9L, GBP4, GIMAP5) have not yet been 
identified, 



Table 4. Taqman RT-PCR confirmation of the Affymetrix microarray results 


Taqman ID 


Gene symbol 


Gene name 


Affymetrix ID 


Sample groups 


P < 0.05 


ddCt 


Ue-rtrtt .noon* 


CQ44 


CD44 antigen 


212014_x_al 


AD vs normal 


- 

1,&2b* 07 


1.90 


H$OD17102Z_rm 


CXCL12 


Chemokine (C-X-C motif) ligand 12 


209587„at 


AO vs normal 


0,00305 


* 2.04 








CRC vs normal 


O.00736 


• 1,95 


HsO0179845jtu 


MET 


Met proto-encogene 


20361 Ojat 


AD vs normal 


1,41£* 06 


2,17 








CRC vs normal 


0,00002 


1,53 


H 5 OQZ00350_m1 


ABCA3 


ATP^binding cassette, subfamily A 


20471 9_at 


AD vs normal 


0.000610 


* 3.35 






(ASCI), member B 


















CRC vs normal 


0.00143 


* 3,20 


Hs0Q2O5545^m1 


AOAMDEC1 


AOAM-like, decysin 1 


206134_at 


AD vs normal 


USE* 05 


* 3.65 








CRC vs norma! 


9,1BH* 05 


* 2.74 


Hs002t4306,jn1 


TRPM6 


Transient receptor potential cation channel. 


22441 2_s_at 


AD vs normal 


5.79E- 05 


« 4.73 






subfamily M. member 6 
















UC vs normal 


0,000385 


■ 4.63 


Hs00153408_rn1 


MVC 


v-myc myelocytomatosis vtral oncogene 


202431„s_at 


AD vs normal 


05 


2,35 




TIMP1 


homologue (avian) 










Hs00l71558_m1 


Tissue inhibitor of metailoproteinase 1 


201666_at 


AD vs normal 


3,90E« 07 


2.58 










CRC vs normal 


0.00153 


2.74 










UC vs norma! 


0,000219 


2.36 


Hs00236937^rn1 


CXCL1 


Chemokme (C-X-C motif) ligand 1 


204470^81 


CRC vs normal 


0.0114 


3,84 








UC vs normal 


1.11E- 05 


4.04 


HsG0236S66_rn1 


CXCL2 


Cnemokine (C-X-C motif) ligand 2 


209774jt w at 


CRC vs normal 


0.00204 


3.70 








UC vs normal 


Q,OGB592 


3.68 


HeOOZ6B139jti1 


CA1 


Carbonic anhydrsse I 


285950_5„at 


AD vs normal 


0,000930 


* 6.13 


Hs00l94353_m1 


LCW2 


LipocaHn 2 


212531„at 


AD vs normal 


2.67E* 07 


6.13 










CRC vs normal 


0,000509 


4.83 










UC vs normal 


2.1SE* 06 


5.0S 


HsQO154230^m1 


CALU 


Calumenin 


Z14B45_s_at 


CRC vs normal 


0,0145 


1,50 


H s 00l69795_m1 


VWF 


von WMIebrand factor 


20211 2_at 


CR vs normal 


0,55142 












UC vs normal 


0.000112 


2,44 


HsOG2SG23? ml 


COL4A1 


Collagen, type IV, a i 


211980 at 


CRC VS normal 


0.0283 


3.38 


HsOOl 56076 jtt1 


BGN 


Siojycan 




CRC vs normal 


0,12042 




HsQ016a777_m1 


PECAM1 


Pbtelet/enOoiheHal cell adhesion molecule 


2089B3_s_at 


CRC vs norma! 


0L76378 




HsO01741(KLm1 


IL8 


Interleukin S 


202859„x^at 


CRC vs normal 


0.0283 


7,21 








UC vs normal 


6,80E» 06 


5.77 


HsOQ204187„,ml 


DUOX2 


Dual oxidase 2 


219727^.31 


UC vs normal 


7.84E- 05 


6.35 




LIPG 


Lipase, endothelial 


21918 1_« 


AD vs normal 


0,000588 


1.35 








CRC vs normal 


0-00711 


1-08 










UC vs normal 


0.000588 


1.35 


HsOQB29485 sH 


IFITM2 


IFN induced transmembrane protein 2 tl-8D) 


201315 x at 


CRC vs normal 


0,00114 


2,28 


Hs00171061jm1 


CXCL3 


Cherookine (OX-C motif) ligand 3 


2Q7B5G„at 


CRC vs normal 


0.00384 


3.22 








UC vs normal 


7.48E* 05 


3,58 


Hs00277299„m1 


1L1RN 


tnterleukin 1 receptor antagonist 


212657jjrt 


C&C vs normal 


0,00714 


4.68 








UC vs normal 


1.10E* 05 


5.30 


Hs00234573 ml 


MMP9 


Matrix metal loproteinase 9 


203936_s_at 


UC vs norma! 


0.00724 


U5 


Hs00160066_m1 




Protease inhibitor 3 f skin-dcrlved (SKALP) 


203B91^at 


UC vs normal 


0.O002S7 


4,28 


HsQQ197374_m1 


UBD 


Ublquitm D 


2O58S0.sjit 


UC vs normal 


0.000261 


3.2G 



Abbreviation: AD, adenoma. 
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Figure 2, Immunostamings in TMA sections. A. Osteonectin fmmunostaiding in CRC (A1) and healthy colonic mucosa (A2)* B. 
Osteopontln Irnrmmostainmg in CRC (B1) and healthy colonic mucosa {B2). C. Siglycan Immunostaming in CRC (C1) and healthy 
colonic mucosa (C2). D. Collagen 4A1 immunostaining in CRC (D1) and healthy colonic mucosa (P2). E« von Wlltebrand factor 
immunostaining in CRC (El) and healthy colonic mucosa (£2). F. MMP9 immunostalning in CRC (F1) and healthy colonic mucosa 
(F2), G> VEGF immunostaining in CRC (G1) and healthy colonic mucosa (G2), hi PECAM1 protein expression in active (BD (Hi) 
and normal colonic tissue (H2), I. Collagen 4A1 protein expression in active IBD (11) and healthy colonic mucosa (12). Th& while 
arrows show the colonic epithelial ceils. Elevated protein levels of osteonectin, cstcopontin, biglycan, collagen 4a1, von Willenbrand 
factor, MMP9, and vascular endothelial growth factor were detected in CRC compared with healthy controls. In proportion to normal 
tissue, overexpression of PEC AMI and collagen 4a1 proteins was found in IBD. 



Identification of Subclasslffer Transcripts, The suc- 
cessful subdivision of IBD to ulcerative colitis and 
Crohn's disease was achieved by the expression of 58 
genes such as eye I in G2, dual oxidase 2 and CEACAM7 
(sensitivity 77.78%, specificity 100%). Adenomas with 
low-grade and high-grade dysplasia could be distin- 
guished using 65 classifier probesets such as aquaporin 1, 
CXCL10, and complement factor 1 (sensitivity: 90.91%, 
specificity: 100%). Early and advanced stage colorectal 
carcinomas were differentiated by 34 discriminatory 
transcripts including transmembrane protein 37, inter- 
leukin 33, carbonic anhydrase 4, visinin-Hke 1, ubiqui- 
tous calcium-transporting ATPaso, and COK inhibitor 
2B by high specificity (100%) and somewhat lower 
sensitivity values (86.67%; Table 2), 

Expression of the Colorectal Cancer -Associated 
Tissue Markers in Peripheral Slood, The differentially 



expressed genes were determined by significance anal- 
ysis of microarrays between colorectal cancer samples 
and healthy normal controls. The presence of these local 
tissue-specific mRNA expression markers in peripheral 
blood samples was also analyzed using the blood 
samples of 19 colorectal cancer and 11 healthy patients. 
Fifty*two transcripts were significantly up-regulated 
both in biopsy specimen and the peripheral blood of 
colorectal cancer patients compared with healthy normal 
controls. Three genes (3LC26A2 sulfate transporter, 
227682_ai and UDP-glucose dehydrogenase) showed 
significantly decreased mRNA level both in colorectal 
cancer biopsy and blood samples compared with 
normals. In some colorectal cancer- related transcripts, 
mRNA expression in blood changed in the opposite way 
compared with iheir levels in cancer tissue. Seventeen 
genes showing elevated mRNA expression in colorectal 
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cancer biopsy samples were down- regulated in the 
peripheral blood of colorectal cancer patients, whereas 
12 genes underexpressed in colorectal cancer tissue were 
found to be overexpressed In colorectal cancer blood 
samples (Table 3.), 

Taqman RT-PCR Validation of 26 Selected Genes, 
The expression of all the 11 (6 unregulated and 5 down- 
regulated in microarray analysis) adenoma -associated 
genes f 15 of the IS colorectal cancer -related genes 
(15 overex pressed and 3 underexpressed), and an the 
H ulcerative col it is -associated genes (13 up-regulated 
and 1 down-regulated) correlated significantly with the 
Affymetrix results (P < 0.05). On average, the rnRNA 
expression of 93% of the selected genes was verified by 
Taqman RT-PCR (Table 4.). 

TMA Analysis and Blood Smear Immunocytochenv 
istry Results. In accordance with mRNA expression 
results, elevated protein levels of osteonectin, osteopon- 
tin, biglycan, collagen 4a1, von Wiilenbrand factor, 
MMP9, and vascular endothelial growth factor were 
detected in colorectal cancer compared with healthy 
controls. Moderate cytopiasmatic osteopontin and osteo- 
nectin staining was found in the apical cytoplasm of 
epithelial cells in healthy colon tissue, Both osteonectin 
and osteopontin showed moderate to strong diffuse 
cytopiasmatic staining in colorectal cancer samples. 
Osteonectin protein expression was also significantly 
increased In blood smears of colorectal cancer patients 
(osteonectin positive mononuclear cells, 20*89% F 2.16%) 
compared wHh the normal (6,72% F 2,65%; P = 6.35 * 
W 9 ; Supplementary Fig. Si), In colorectal cancer cases, 
strong subepithelial BGN immunostainmg was found In 
lamina proprial myofibroblast like cells and leukocytes, 
No epithelial BGN immunoreactivlty was detected. Most 
of the normal samples were negative for BGN, but In 
some cases weak apical epithelial BGN immunostainmg 
was found, and no subepithelial labeling was seen, 
Whereas all normal samples were negative for Col4A1, 
certain carcinomatous cells showed a moderate to strong 
epithelial Col4A1 immunostainlng in colorectal cancer 
samples- There was no lamina propria immunoreactivlty. 
Regarding vWF, there was moderate epithelial tmmu- 
nostaining in carcinomatous cells in colorectal cancer 
samples, and some vWF trnmunoreactivity was also seen 
scattered in tho lamina propria whereas in normal cases 
no mucosal immunostaining was seen. Subepithelial 
MMP9 immunostainmg was found to be moderate and 
strong in lamina propria! leukocytes in colorectal cancer 
cases but not in carcinomatous epithelium, A diffuse 
weak intracytoplasmatic epithelial immunoreactivlty 
was seen in normal samples. In case of vascular 
endothelial growth factor, epithelial irnmunoreactivky 
was found to be moderate to strong diffusely in 
carcinomatous cells of colorectal cancer samples, The 
subepithelium showed a moderate reaction, Weak to 
moderate subepithelial and luminal epithelial vascular 
endothelial growth factor Immunoreactivlty was found 
m almost all normal samples (Fig. 2). 

In comparison with normal tissue, PECAM1 and 
collagen 4a1 proteins were overexpressed in IBD in 
accordance with the up-regulated mRNA levels detected 
by rmcroarrays. In IBD samples there was a strong 
subepithelial PECAM1 immunoreaction in lamina prop- 
rial leukocytoid cells. There was no epithelial immuno- 



reaction in any of the normal samples, In several IBD 
samples a weak Coital immunoreaction was found 
compared with normals, No subepithelial Immunostain- 
mg could be detected (Fig. 2). 



Discussion 

In this study, 85 colonic biopsy samples and 30 
peripheral blood samples were analyzed in total by 
whole genomic expression microarray s to identify focal 
tissue classifiers between the diagnostic groups and to 
analyze the presence of the tissue expression markers in 
peripheral blood* 

In the daily routine, the situation where the biopsy 
sample taken during the endoscopic intervention is not 
evaluable in the appropriate manner by conventional 
histology occurs relatively frequently. Diagnostic expres- 
sion profile from the whole biopsy specimen can 
overcome the sampling error failures in histology. 

For the objective, molecular- based classification of the 
biopsy samples into main diagnostic groups, classifier 
transcript sets were determined. Functional analysis of 
significant genes can provide important information, 
because with the identification of the main signaling 
pathways, the key genes characterizing the given 
patbomechanism can be found and used for diagnostic 
analysis. 

Because an IBD, especially the long-standing ulcera- 
tive colitis, is a precancerous condition, the analysis of 
IBD specimen is important to find early the adenoma- 
dysplasia-carcinoma sequence -related genes. 

All three IBD classifiers have been hypothesized to 
show increased expression in IBD. In case of a tissue 
injury associated with IBD, REG1A (regenerating islet- 
derived la) mRNA was observed to be highly expressed 
in colonic mucosa (38). The protein product of this gene 
has a positive regulatory effect on cell proliferation (39), 
and may contribute to reduce epithelial apoptosis in 
inflammation (3B). Matrix metal loprotetnase 3 (MMP3), 
Involved in wound repair and tumor initiation, was 
also up-regulated in IBD (40). Microarray analysis done 
by Mizoguchi et ai Indicated that the third classifier, 
chltlnase 3-1 ike 1 (CHI3L1) is overexpressed specifically 
in inflamed mucosa. CHI3L1 plays a pathogenic role in 
colitis, presumably by enhancing the adhesion and 
invasion of bacteria on/into colonic epithelial cells (41). 
Dysreguteted host/mfcrobral interactions seem to play a 
central role in the pathogenesis of IBD, 

Analyzed by function, most of the colorectal cancer 
versus adenoma discriminatory genes are Involved in 
intracellular signal transduction (GNG11, latrophihn, 
AKAP12, ELTD1, tensln 1, axin 2, GNB4, ELTD1), ceil 
proliferation (IGFBP3, MCC, LATS2), cell adhesion 
(ROBG1, AEBP1, VWF, collagen 15A1, DDR2, PLEKHC1), 
and transcription regulation (like NR3C1, WWTR1, 
MEIS1, MEF2C, SNAI2). However, the functions of 
several discriminatory transcripts are still unknown, For 
instance, gremlin 1 (GREM1 ) which is represented among 
the classifiers with two pro besets as an antagonist of 
BMP, may play a role in regulating organogenesis, body 
patterning, and tissue differentiation, it was overex- 
pressed in various human tumors including carcinomas 
of the lung, ovary, kidney, breast, colon, pancreas, and 
sarcoma (42). 
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Polyps could be classified into adenomatous and 
hyperplastic polyps according to the expression levels 
of nine transcripts. The ABCA8 A8C transporter, which 
was previously found to be underexpressed in colorectal 
cancer (28, 43), showed decreased expression in adenoma 
compared with hyperplastic polyp samples* Lower 
glucagon mRNA ievefs in adenomas may refer to the 
altered Intestinal barrier function (44) and disordered cell 
proliferation regulation, Interestingly, 1GF1, overexpres- 
sion of which is closely associated with the early stage 
of colorectal carcinogenesis (45), was found to be more 
Intensely expressed in hyperplastic polyps than in 
adenomas. The lower perixoredoxin B expression may 
indicate weaker protection against oxidative stress in 
adenomas. The exact functions of the MAMDC2, C2orf32, 
229S70_at, and KIAA1199 discriminatory transcripts 
have not yet been clarified. 

The colorectal cancer versus IBD discriminatory genes 
are mainly immune and defense response- related genes 
(like CXCL9, CXCL10 chemokine Ngands, CCR2> CCRL1 
chemokine receptors, interleukin 18 binding protein, 
GBP1, GBP5, NOS2A, INDO, TNFSF13B, toll-like recep* 
tor 8, 227458_at) which showed decreased mRNA levels 
in colorectal cancer compared with IBD samples, CD3B 
expressed mainly in leukocytes is involved in cell 
adhesion and signal transduction, RARRES3 is a negative 
regulator of cell proliferation, whereas ECGF! is a 
growth factor with angiogenic effects, RARRES3 has 
been reported to act as a tumor suppressor or growth 
regulator (46), Its decreased expression in colorectal 
cancer seems to support this assumption. Autocrine 
production of BCGF1 by endothelial cells may be a 
mechanism of inflammatory angiogenesis but not tumor 
angiogenesis and might be particularly important for the 
maintenance of damaged vasculature in IBD (47), The 
functions of some newly identified expression markers 
{FAM26F, FCRL5, SAMD9L, TN1P3) are unciarified. 

The main diagnostic groups (colorectal cancer, IBD, 
adenomas, hyperplastic polyps) can be distinguished 
according to the mRNA expression levels of IB penes 
determined by the random forest classification method 
with a 12,3% prediction error. 

Besides the objective classification of the samples into 
main diagnostic groups, the differentiation among 
disease subtypes is aiso important for the improvement 
of the molecular-based diagnostics. 

A relatively high number of classifiers is required for 
differentiation between high-grade and low-gradB dys- 
plastic villous adenomas. Several tumcrigenesis-rejated 
discriminatory transcripts (such as HIPK1, CDC25B, 
CXCU, and HMGA2) were found to be overexpressed 
in high-grade dysplasia adenoma referring to the high 
risk of colorectal cancer development (13, 14, 43, 48, 49). 
Hamecdomain interacting protein kinase 1 (HIPK1) may 
thus play a role in tumorigenesis, perhaps by regulating 
the expression of p53 and/or Mdro2 (48). A correlation 
has previously been shown between the presence of 
HMGI proteins and the expression of a highly malignant 
phenotype in epithelial and fibroblastic rat thyroid cells. 
Moreover, HMGAZ seems to be involved in colorectal 
carcinogenesis (49), 

fvlost of the colorectal cancer subtype classiflcators are 
involved in transport processes (calcium ton transport: 
transmembrane protein 37 r ubiquitous calcium-transporting 
ATPase, CLIC6 chloride transporter, CYP4X1 electron 



transporter, 6ABRB2 chloride channel, SLC26A2 
sulfate transporter), in metabolic processes (carbonic 
anhydrase 4, UDP glucuronosyltransferase 2 A3 poly- 
peptide, glycosyitransferase-like 1B, monoacy I glycerol 
f>acy transferase 2), in cell adhesion and motility 
(espm, muctn-like protocadherin, tctraspanln 5), in signal 
transduction (visinin-like 1, C13orf18), and in ceil cycle 
regulation (SPC25 kinetochore complex component, CDK 
inhibitor 2B), Visinm-like 1 (VSNL1) was overexpressed 
in neuroblastoma tumor specimens from patients with 
distant organ metastases compared with those without 
metastases (50). Decreased expression of the CDKN2B 
(alias p15) tumor suppressor gene is also typical in 
advanced colorectal cancer, 

The future perspectives are to state the diagnosis and 
to perform screening using a more easily available 
sample source such as peripheral blood, and the further 
required diagnostic-therapeutic steps may be done 
with the help of them. However, WBC circulating in 
the peripheral blood tour ail tissues of the body, and 
gene expression changes in them ore afrected by more 
conditions than the gene expression patterns in local 
tissue alterations, it is important to find the tissue 
markers that appear also In peripheral blood and can 
be specific for a given organic alteration. 

Several colorectal cancer- associated tissue markers 
changed in peripheral blood paraHel to the locally 
measured expression levels. Genes showing up-reguiatlon 
in both biopsy and peripheral blood samples of colorectal 
cancer patients compared with normal controls are 
mainly involved in cell adhesion {Mke CD44, TGFbl, 
ICAM1, verstcan, collagen 18A1, pelota homologue 
endothelial cell adhesion molecule), cell proliferation 
(such as IFITM1, 1FITM2, TIMP1, fascin homologue 1), 
and Intracellular signal transduction (including S100A11, 
filamin A f and DDEFl), whereas the functions of nine 
transcripts (like CCDC85B, TM9SF4, C6orF14o\ and 
TMEM158) have not yet been identified. The gene 
signals may come from peripheral blood mononucteic 
cells, as well as from circulating tumor cells, Previously, 
we reported a significantly positive correlation between 
the number of circulating tumor cells and clinical 
properties of colorectal cancer (51), The underex pressed 
genes in both biopsy and blood samples are involved in 
metabolism (UGOH) and sulfate transport (SLC26A2), 
whereas the function of 227682.jat is unknown. In some 
colorectal cancer- related transcripts, mRNA expression 
in blood changed in the opposite way compared with 
their levels m cancer tissue. This phenomenon may 
relate to secondary immunologic processes incJuding 
tumor-infiltrating lymphocytes rather than circulating 
tumor cells. 

The expression of selected IBD- and colorectal cancer- 
associated genes was also measured at protein level, on 
222 tissue sections of 29 overlapping and 93 Independent 
sets of patients. TMA technology allowed the standard- 
ized analysis of a large number of samples within a 
short time and the validation of some of the mRNA 
expression results. In accordance with mRNA expression 
results, elevated protein levels of osteonectin, osteopon* 
tin, bigiycaa collagen 4A1, von Wilienbrand factor, 
MMP9, and vascular endothelial growth factor were 
detected in colorectal cancer compared with healthy 
controls. Osteonectin protein expression in blood smears 
of colorectal cancer patients was also significantly 
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elevated compared with normal controls. Oversxpre^ 
sion of PECAM1 and collagen 4a1 proteins was detected 
in IBD compared with normal tissue, in accordance with 
the up-regulated mRNA levels detected by microarray* 
In conclusion, whole genomic microarray analysis 
using routine biopsy samples may be suitable for the 
identification of discriminative signatures for differential 
diagnostic purposes. Our results may serve as a basis of 
new gene expression pattern- based diagnostic methods 
like Taqman and/or LightCycier 480 real-time PGR 
cards. As the mRNA expression results showed a strong 
correlation with the protein level expression, simulta- 
neous analysis of protein marker sets can also take place. 
Nowadays, antibodies recognizing a wide range of 
proteins in for mo! -paraffin tissue sections are available, 
offering rmmunostainrng of disease-specific markers as a 
simple test for daily diagnostic utilization. 
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DETECTION OF SEQID_7 RNA BY RT-PCR 

A quantitative TaqMan assay was developed to evaluate the relative expression of the SEQID_7 
sequence derived from the discovery of clones 8.2d (A) and 12,2f (B). The expression of SEQID 7 
was measured in in 20 normal (blue) and 51 adenomas (red) colorectal tissue specimens using the 
oligonucleotide primers, 5'- CAGACmACATCATGGGTGACCA (forward) and 5- 
GCCATCCTGTGGCCCC (reverse), together with a TaqMan probe, 
TCCCGCAGAGTTGTACAGMCCTCCC, targeting a 162nt segment located from nucleotides 3364 to 
3432 , Resuiting expression data were normalised relatively to HPRT1 using the Livak method (2- 
AACT) and are displayed as box-and-whisker diagrams, where the 'boxes' give the median values 
and the lower and upper quartiies of the datasets. 



FIGURE 5 



DEMONSTRATION OF DIAGNOSTIC UTILITY OF SEQID_7. 

(A) Eleven oligonucleotide probes were designed to monitor the expression of SEQ ID 7 RNA in 
colon tissue specimens from 30 normal subjects (left bars) and 21 subjects with either adenomas 
(middle bars) or colorectal cancer (right: bare) using a custom-made microamay gene chip. The CG 
Custom GeneOhips were processed according to manufacturers instnjctions for the Aflymetrix 
HuCene ST 1.0 array using biotJnyfated DMA derived from an original primary &NA eventration of 
lOGng. Hybridization to the Custom Chip CG__AGPa520460F was earned out at 4S°C for 16-18 
horns. Finally, the chips were washed, stained and scanned as recommended by Asymetrix using 
an Asymetrix Scanner 3000. Transcript expression ieveis wrere calculated lay both Microarray Suite 
(MAS) 5.G (Affymetrix) and the Robust Muftlchlp Average (RMA) normalization techniques 
(Affymetnx. GeneChip expression data analysis fundamentals. Affymetoix, Santa Clara, CA USA, 
MM; Hubbell et al BiGirafonmatKs, 18:1585-1592, 2002; Irtearry et at HucMc Add Research, 31, 
2003) MAS nom*afced data was used for performing standard quality control routines and the final 
data set was normalized with RMA for ai! subsequent analyses. 

(B) The average expression across the eleven olfgonudeotide probes described in (A). 
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Abstract 

Colorectal cancers are believed to arise predominantly 
from adenomas. Although these precancerous iesfqns 
have been subjected to extensive clinical, pathologic, 
and molecular analyses, little is currently known about 
the global gene expression changes accompanying their 
formation. To characterize the molecular processes 
underlying the transformation of normal coionic 
epithelium, we compared the transcnptomes of 32 
prospectively collected adenomas with those of 
normal mucosa from the same individuals. Important 
differences emerged not only between the expression 
profiles of normal and adenomatous tissues but also 
between those of small and large adenomas, A key 
feature of the transformation process was the 
remodeling of the Wnt pathway reflected In patent 
overexpressjon and undere>cpression of 78 known 
components of this signaling cascade. The expression 
of 19 Wnt targets was closely correlated with clear 
up-regulation of KIAA11W, whose function is currently 
unknown. In normal mucosa, K1AA1199 expression 
was confined to cells in the lower portion of intestinal 
crypts, where Wnt signaling is physiologically active, 
but it was markedly Increased in all adenomas, where it 
was expressed in most of the epithelial cells, and in 
colon cancer ceil lines, it was markedly reduced by 
inactlvation of the B-cateninfl^ceil factor(s) transcription 
complex, the pivotal mediator of Wnt signaling. Our 
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transcriptomic profiles of normal colonic mucosa and 
colorectal adenomas shed new light on the early stages 
of colorectal tumorigenesls and identified KIAA1199 as a 
novel target of the Wnt signaling pathway and a putative 
marker of colorectal adenomatous transformation. 
(Mai Cancer Res 20Q7;$(12):12G3-75) 

Introduction 

In developed countries, sporadic adenomatous colorectal 
polyps are found in roughly one third of asymptomatic adults 
beEow the age of 50 who undergo colonoscopy. Depending on 
their characteristics (multiplicity, size, histologic features, and 
degree of dysplasia), these lesions can be associated with a 
substantial risk of recurrence (up to 60% at 3 years) and the 
development of advanced neoplastic disease (reviewed in ref. 1 
and references therein). It has been estimated that 1$% of all 
adenomas measuring z1 cm will progress to carcinomas within 
10 years of their detection (2)* 

Af though adenomatous polyps are not the only precancerous 
lesions in the cotorectum, they are the most common, and they 
are the precursors of most of the cancers in this organ. In these 
neoplasms, the transformation process begins in the epithelial 
crypts and seems to result from qualitative, quantitative, and 
spatial subversion of the Wnt signaling pathway, the physio- 
logic regulator of epithelial homeostasis (3-5), This adenoma- 
carcinoma pathway of tumorigenesis is characterized by 
mutations involving various components of this pathway 
(e.g, r ARC, whose germ-line mutations are responsible for 
familial adenomatous polyposis; CTNMB1, which encodes a 
subunit of the cadherin protein complex known as h-catenin; 
and Axin, the gene encoding a multidomain scaffold protein 
that is essential for h-catenin degradation). The result of these 
mutations is an accumulation of rvcatenin, first in the 
cytoplasm and then in the nucleus, where it associates with 
DNA-binding proteins of the T~call factor (TCF)/lymphoid 
enhancer factor family, transforming them from transcriptional 
repressors into transcriptional activators that affect the expres- 
sion of numerous genes involved In epithelial homeostasis. 

Although the key role played by adenomatous polyps in 
colorectal tumorigenesis is widely acknowledged, the gene 
expression changes that trigger or accompany their development 
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have never been comprehensively studied. We therefore 
conducted a iranscripiomic analysis of prospectively collected 
colorectal adenomas using a standardized oligonucleotide 
rnicroarray covering the entire human genome. This study not 
only provided new information that is fundamental for future 
molecular characterization of these precancerous lesions but 
also allowed us to identify a putative marker of colorectal 
tumorigenesis. 

Results 

The focus of our study was the adenoma-adenocarcinoma 
pathway of colorectal carcinogenesis, which Is closeSy linked to 
deregulation of the Writ signaling pathway. To gain insight into 
the early steps of this process, we confined our investigation 
exclusively to sporadic, pedunculated colorectal adenomas 
(type 0-lp of the Paris classification; ref, 6)- Nonpolypoid and 
sessile polypoid lesions were not included because in some 
cases their transformation is believed to proceed along 
nonadenomaious pathways (7). Details on our case selection 
criteria are provided in Materials and Methods. 



Thirty-two pedunculated adenomatous polyps, each with 
matched samples of normal mucosa, were prospectively 
collected from 28 patients {Table 1), The total number of 
synchronous and previously excised adenomas was <3 in 18 of 
Z8 patients and 3 to 15 in the remaining 10. In this latter 
subgroup, the absence of AFC- or MYB-assoeiated multiple 
adenomatosis had been confirmed by genetic testing of 
lymphocyte Histologic analysis of one polyp (case 

MM) revealed superficial infiltration of the submucosa, but this 
case was not excluded because the region sampled for 
microarray analysis was clearly adenomatous. (As noted below, 
this finding was consistent with the results of hierarchical 
cluster analysis shown in Supplementary Fig. S1.). 

Analysis of microarray data for the 32 adenoma/normal 
mucosa tissue pairs revealed that 31,033 of the probes were 
expressed in one or both of the tissue groups. The normal 
' tissues were effectively segregated from the adenomas in four 
unsupervised analyses of the expression levels of these genes 
[hierarchical clustering, principal component analysis (PCA), 
correlation analysis, and correspondence analysis (CA); see 
Materials and Methods for details; Fig. 1]. In a separate 



Table 1. Characteristics of the 28 Patients with Adenomatous Polyps Included in the Study 



Parley Age (y) Sex Colon Maximum Microscopic HighDSl Degree of No. adenomas No. previously No, previous sad/or Familiarity for 
segment adenoma appearance degree or dyspasia at at study excised adenomas^ synchronous colorectal cancer 

invoked diameter dysplasia sampling site* colonoscopy 0 hyperplastic polyps {relative, onset ago) 

(mm) Kn the 
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Abbreviations; M, male; F. femate; A, ascending colon; T r iraftsversum; D, descending colon; S, sigmoid colore R* rectum; T, tubular; TV, tubulovites; L, low-grade 
dysplasia; H, high-grarfe dysplasia. 

•Low-grade versus Jiigh-gradt dysplasia as defined by the WHO classification of turners of toe digest system, editorial and consensu conference in Lyon, France, 
November 6-9, 1899, IARC. 

cTrris number includes the adenoma(s) su^ected to mrcroarray analysis, . 
bloiBi number of adenomas detected and excised during previous colonoscopies, 
*Two adenomas from these patients were analyzed, 
kThese cases were considered as recurrent adenomas for me CCA, 

(The Index colonoscopy was done in a different center about 10 y before foe study colonoscopy. 
•"Hyperplastic polyposis, 
ccNo previous colonoscopies. 

^Superficial submucosal Invasion (T1), Tha tissue collected for microarray came from the sdenomatous portion of the polyp. 
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analysis, these two tissue groups were also unequivocally 
distinguished from a previously described set of 25 colon 
careers (8), which we reanalyzed for this study with the same 
microarray used to characterize the adenomas ami normal 
mucosa (Supplementary Fig. SI). . 

Almost half of the expressed probes (15,059 of 31,033) 
displayed significant expression changes In adenomas. Those 
with fold changes z2 (1,190 probes up-regulated and 2,469 
down-regulated In adenomas) were subjected to gene ontology 
analysis to identify the biological processes involved In the 
transition of normal mucosa to adenoma; The most significant 
results of this analysis are listed in Supplementary Table S1, 
The processes that were most markedly overrepresented among 
genes that were up-regulated in adenomas included mitosis, 
DNA replication, and spindle organization. Down-regulated 
genes were predominantly involved in host Immune defense, 
inorganic anion transport, organ development, and inflamma- 
tory response, although a small group of genes involved in the 
latter process was up-reguiated in adenomas (Supplementary 
Fig, S2). 

We then analyzed the transcript levels of 319 genes believed 
to be components of the complex Wnt signaling pathway 
(Supplementary Table S2). Sixty-six of these genes (21%) were 
not expressed In either the normal or adenomatous tissue, 
and 34% were expressed similarly in bath tissue groups. The 
remaining 144 genes displayed significantly, altered expression 
in adenomas, and 78 of 144 displayed fold changes of z2. 

A supervised extension of CA (9), canonical CA (CCA), 
was then used to identify possible correlations between gene 
expression patterns and clinical or pathologic variables. Four 
of the variables considered (adenoma diameter, colon segment 
of origin, degree of dysplasia, and adenoma recurrence; see 
Table 1} were clearly associated with distinct clusters of 
expression profiles (Fig. 2 t variables in A and clusters for 
adenoma diameter in B; more details in the legend to this 
figure). The profile of adenomas measuring >20 mm could be 
easily distinguished from those of smaller (V20 mm) adenomas. 
As shown by CCA and visualized on the corresponding CCA 
score plot (Fig, 2B), the centers of the three adenoma size 
clusters are distributed along the principal CCA axis (the 
vertical axis in Fig. 28, Ihe most important axis of separation of 
the expression profiles) in a definite order, with increasing 
diameters corresponding to progressively higher CCA scores, 
. The variable large adenoma diameter was closely correlated 
with the vertical CCA axis (l.e„ its vector "d>20mm" in 
Fig. 2 A is almost parallel to this axis), ft is interesting to note 
that the same correlation can be observed for the variable high- 
degree dysplasia (i.e., represented in Fig. 2A by vector "Hd"), 
This finding confirms the expected correlation between larger 
diameters and higher-degree dysplasia. 

The CCA plot of the 11,709 modeled probes (loading plot, 
not shown) suggested that the distinction between the three size 
groups of adenomas is due to a complex network of relatively 
small changes in the expression of numerous genes (as opposed 
to marked changes involving a limited number of genes). 
Nevertheless, to maximize the use of the extensive data sets, we 
selected the 500 probes with the highest loading scores along 
the CCA axis 1 and isolated a set of genes whose expression 
Changes displayed significant positive or negative correlation 



wish adenoma size {Supplementary Table S3). Although their 
association with adenomas must be validated in a larger series, 
these are the expression changes most likely to play causal roles 
in the progression of these tumors. 

It should be mentioned that normal mucosa from the sigmoid 
colon had an expression prof lie that differed significantly from 
that of tissues from other colon segments (Fig. 2A). This 
finding will be explored in a future study conducted on a large 
series of norma! mucosa samples from different colorectal 
segments. 

The transcriptional profile of the 32 adenomas was 
thoroughly analyzed to identify genes likely to be involved 
In the development and evolution of these lesions. One of the 
first features that attracted our attention was the marked up- 
regulation of KIAA1199 (Supplementary Table S4), a gene 
encoding a protein with unknown function, its overexpression 
was striking in all colorectal adenomas we examined (average 
increases of 54,8-fold compared with normal mucosa) and in 
carcinomas (8). These findings were fully confirmed by real- 
time reverse transcrlptfon-PCR analysis of RN A extracted from 
samples used for the microarray study and from additional 
samples collected after the present study was completed 
(Supplementary Fig, S3). 

. In light of these findings, it was natural to wonder whether 
KIAA1799 might be a novel positively regulated target of Wnt 
signaling/ which is characteristically deregulated in colorectal 
tumors. Previous microarray studies indicated that genes 
coagulated at the transcriptional level under different con- 
ditions tend to be involved in the same processes and pathways, 
and the analysis of transcriptional coexpression has been used 
to predict the function of novel genes (10-12), Therefore, we 
conducted a search for known Wnt targets (listed in 
Supplementary Table S5) among the genes whose expression 
patterns in all the tissue samples significantly correlated with 
those of KIAA1199, (The procedure used in this analysis is 
summarized in Materials and Methods and Supplementary 
Fig. 34,) Forty-nine percent of the known Wnt targets that were 
overexpressed in our adenoma samples had expression patterns 
that were positively correlated with that of KIAA1199 (Fig, 3A 
and B) as opposed to only 7,9% of the overexpressed genes that 
are not considered Wnt targets (P < 0X001). 

Evidence of the potential Involvement of KIAA1199 in the 
Wnt signaling pathway had also emerged from another study by 
our group (13). A combined analysis of microarray data of 
tissues and eel! lines placed KIAA1199 at the top of a list of 
genes [Supplementary Table S1 of ref. 13] that were up- 
regulated in colorectal adenomas and down-regulated in colon 
cancer cell ilnes that had undergone stable transfection with 
doxycycline-inducible forms of dominant-negative TCF1 or 
TCF4 to suppress Wnt signaling (14 r 15). in the present study, 
K1AA1199 was also found to be markedly down-regulated in 
LSI 74T colon cancer cells in which Wnt signaling had been 
blocked by the induction of h-catentn small interfering RNA or 
Nonterminal- deleted TCF4 (15, 18). The dramatic decrease in 
KIAA1199 mRNA levels associated with this inhibition of the 
Wnt pathway was confirmed by Northern blotting (Fig* 3C), 

In general, Wnt target genes are expressed predominantly in 
the proliferating compartment of normal intestinal crypts (lower 
portion), and their expression is appreciably Increased in 
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FIGURE 2. Clinical/pathologic variables that correlate with distinct gene expression profiles. The panels summarize the mast importani results of the CCA 
of mRNA intensfcy log-ratio vafyes (adenoma! normal) of expressed genes. For clarity, CCA axis 1 has been drawn vertically in both panels. A. Correlation 
between specific clinical/pathologic variables (adenoma diameter, colon segment of origin, degree of dyspfasta, and adenoma recurrence) ana clusters of 
differential gene expression profiles (coded as tog-ratio profiles), such as those shown in B, £acb vector represents a specific value for a given variable (e.g., 
adenoma diameter of >20 mm and high-degree dysplasia) and points toward the center of the profile cluster correlated with the clinlcal/pathoSogic 
characteristic if represents. If the centers for each specific value are separated, the corresponding vectors point in distinct Erections; otherwise, they are 
directed toward the same point. In the former case, the represented variable can he assumed to be significantly correlated with the profiles; in the latter case, 
there is no correlation. The length of the vector reflects the strength of the correlation: those approaching the circumference of the correlation circle, which 
represents a correlation value of 1, indicate stronger correlation than shorter vectors (correlation closer id 0), d r diameter; Hd, high-degree dysplasia; Id, low* 
degree dysplasia; A, ascending colon; T, transverse colon; D, descending coron; 5, sigmoid colon; R rectum; Rec, recurrent adenomas; no Rec, no recurrent 
adenomas. Unlabeled vectors are related to variables that were not clearly associated with any distfncl cluster of expression proves. Larger adenomas were 
predictably associated with high-degree dysplasia. In contrast, their association with nonrecurrence was unexpected and probably due to the fact that 
patients who had already undergone endoscopic polypectomy (i,e., those wiln recurrence) presented reiatfvery r ecenvonset (consequently, smaller) polyps at 
the study colonoscopy, p. CCA score plot with samples grouped by adenoma diameter. Each of the three stzeTelated groups is delimited by an ellipse with 
the center labeled. The ellipse representing the adenomas measuring >20 mm in diameter shows very little overlap with shose of the other two groups 
(adenomas with cfiarneters of 20 mm and (hose with diameters of <20 mm). 



adenomatous glands (15}» Our analysis of human tissues with 
preserved architecture Indicated that these are also attributes 
of KIAA1199. In in situ hybridization studies, KIAA1199 
mRN A was delectable only in the lower portion of normal 
colonic epithelial crypts (Fig. 4A and B), and its expression 
levels were much higher in dysplastic glands (Fig, 4C). These 



patterns were confirmed at she protein level by immunohislo^ 
chemistry done with an antibody raised in our laboratory 
(Fjg> 4D-J), This analysis also revealed that the KIAA1139 is a 
cytoplasmic protein whose expression is most intense near the 
cell membrane, particularly on the luminal side of the dysplastic 
cell multilayer {Fig. 4F~J)> 



FIGURE 1 , Unsupervised analyses of microarray data. A. Hierarchical cEustering analysis. The 64 tissue samples represented on the X axis Include 32 
normal mucosal samples {green branches) and 52 adenomas (red branches). Each probe plotted on the Y axis is cotor coded to Indicate the tevel of 
expression of the gene relative to its median expression tevel across the entire tissue sample set (blue, low; red, high). In ihe adenoma dendrogram, 
branches representing Individual samples and small groups merge at higher levels than those of the normal mucosa dendrogram, reflecting tower-level 
correlation (i,e,, higher variability among Ihe adenoma specimens), B. PC A, Profile plot of toe norrnafaed first principal component {PCA1) across the 54 
specimens (green dots t normal mucosa; red dots. Bdenomas). The two tissue groups differ significantly in terms of PCA1 (P < 0,0001), which accounted for 
25% of the total variance, Note the higher variability of the PCA1 values in the adenoma group (higher fluctuation), C* Correlation analysis, Hie plot 
visualization of the pairwise correlations of the samples. Correlation values are indicated on the grayscale column (white > black: high > low). High correlation 
is observed among the samples within each group (top right quadrant, adenomas; bottom left quadrant, normal mucosa), although the adenomas displayed 
somewhat greater diversity (i.e., on the whole, Ihe gray tones m the top right quadrant are darker than those In the bottom left quadrant), Top left and bottom 
light quadrants, normal and adenoma samples are poorly correlated* However, samples from the same patient generally showed higher correlation than that 
observed between normal and adenoma samples from different patients (bright pixels on the secondary diagonals in the top left and bottom right quadrants). 
This finding probably reflects the strong influence of several factors, including the individual genetic background and lifestyle and the fact that the normal and 
adenomatous tissues from a given patient were from the same colon segment. P, CA of mRNA tagOntensiiy) values of expressed genes from 27 of the 32 
tissue pairs (green dois , normal mucosa; red dots, adenoma), The other five pairs were excluded from this analysis because one of the two samples behaved 
as an outlier, Umiting our analysis to the more homogeneous pairs facilitated the comparison of the gene expression profiles for the two tissue groups and 
allowed more reliable identification of ciinicBlfpathologic variables associated with profile scatter (see Fig. 2), The areas delimited by the e&pses represent 
95% of the estimated binorma! distribution of the sample scores on the firs? and second CA axes. The map of the sample scores on the first two exes shows 
that CA efficiently discriminates between normal and adenoma samples, Higher variability is evident in the adenoma group, where the samples are more 
widely dispersed. 
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Discussion 

Adenomatous colorectal polyps are one of the most common 
human tumors and the most frequent precancerous lesions in 
the colorecturn, but their transcriptome has been only partially 
analyzed, and the data are generally based on a limited number 
of cases (17-20). We attempted to rill this gap by doing a 
comprehensive whole-genome microarray analysis of a large, 
highly homogenous set of adenomas that was collected 



A comparison of the transcriptome* of adenomatous polyps 
and segment-matched samples of normal colorectal mucosa 



revealed evidence of broad-seate remotfeltng. As a starting 
point for future verification studies, we have drawn up a list of 
47S genes that were significantly op-regulated (n » 153) or 
down-regulated (n = 325) in the atJenomatous tissues (foid 
changes of zA; Supplementary Tattle S4), Space constraints 
preclude more than a cursory oxacnination of this list, but we 
have highlighted in Table 2 certain aspects that we fee* are 
particularly Interesting in terms of their relevance to the process 
of adenoma formation. For instancy transcription regulation 
seems to be extensively modified. Twenty-nine molecules 
involved in Shis process were expressed in adenomas at levels 
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FIGURE 3. KIAA1199 Is a 
putative target of Writ signal- 
ing, A, Degree of correlation 
between the expression or 
mRWA and Jhat of 
19 ltnown Wrft signaling targe* 
genes identified with the pre 
cecfure described in Materials 
and Methods, Results, and 
Suppfenierftary ffg, 54. For 
eacr> of Uie 20 genes, the 
graph shows the normalized 
intensity of expression level 
{plotted on the Y axis) in each 
of tfte 32 adenomas and 
corresponding samples of nor- 
mal mucosa £X axis). B, Wean 
expression of each gene in 
normal mucosa (green dots) 
md adenomas (red dots). 
Bars, confidence Interval. C. 
Northern blot snowing reduced 
Kl A All 99 expression fn 
LS174T cells following doxy- 
cyetine-fneolated induction of 
h-eatestirt smalt interfering 
RWA, dominant-negative 
TCF4 &taTCF4), or ISJH 2 - 
terminal- deleted TCF4 (N- 
TCF4). The f 8-kb band 
corresponds to fuW-length 
KIAA1199 mRNA, The lower 
band (f 5 kb) may represent 
an alternative form of this 
mRWA. Dox, cell transfectants 
grown tn the presence or 
absence of doxycycline; Trl + 
a parental clone (i.e.. cells 
expressing the repressor pro- 
tein mmffled by doxycycline 
but not Uansfected with h« 
catenin sm&l! fmerferlng RNA, 
dominant-negative TCF4, or 
WKj-terrmnat ~ deleted TCF4) 
use£ as a control of dOKycycline 
exposure. Bottom, sthidfum 
brofntde-sSained agarose gel 
as a loading control. 
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FIGURE 4, Expression of 
KIAAH9& mRNA and protein 
m norma! intestinal mucosa 
and colorectal tumors, in situ 
hybridization studies (A*C) 
localized KIAA1199 mRNA ex- 
pression to the Sower portion of 
normal eplthe^al crypts (A and 
B) and revealed that expres- 
sion is markedly up-regulaled 
in colorectal surws [C). Aster* 
isk, note the different levels of 
expression in $umor glands and 
normal crypts. D. Kl A All 99 . 
protein expression is also limit* 
ed to the lower half of the 
normal colonic crypts, and a 
similar pattern is observed in 
ihe Heat mucosa (E), where the 
protein is expressed only in the 
crypJs (not in the villi). In F 
and G r adenomatous' crypts 
with low-grade dysplasia pres- 
ent increased expression Df 
KIAA1199, particularly in Ihe 
cytoplasm facing the crypl lu- 
men, and in and around the 
mucin vacuoles of goblet cells 
(note the striking difference with 
goblet ce&s of normal crypts in 
both panels). The expression 
pattern changes dramaticalry 
during the transition from low- 
grade dysplasia with goblet ceil 
differentiation (H) to high-grade 
dysplasia In which this differed 
tiauon is no longer apparent, 
j. In more advanced colon 
tumors, KIAA1199 overexpres- 
sion is maintained. Note that, 
in I and J, the expression of 
KIAA1199 protein (like that of 
KIAA1193 mRNA; C) is highest 
in the luminal portion of Ihe 
dysplastic glands (arrowheads, 
multilayer , of unstained nuclei 
occupying more than the basal 
half of the dysplaslic epHheii* 
urn). K, Normal mucosa, with 
the corresponding tumor In 
the inset. Negative control; 
KIAA1199 antibody preab- 
sorbed with the peptide used 
to immunize rabbits. 
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>4 higher or tower than those observed in the normal mucosa, 
but there were also several smaller changes in this category 
(Supplementary Table S6) that might afso have dramatic effects 
on gene expression. Several other alterations reported in Table 2 
are noteworthy in terms of their potential effect on cell 
proliferation, differentiation, apoptosls, and celt adhesion: (a) 
up-regulation of four members of the REG (regenerating) 
family of genes (21, 22), which would lead to increased tissue 
mitogen expression; (b) up-reguiation of LCN2 (23) and dawn- 
regulation of ZFHX1B/S1P-1 (24} in the absence of significant 
changes in the expression of the epithelial cadherin CDH1 
(£-cadherin), which would prevent or delay the. epithelial- 
Mot Cancer Res 2007:5(12). December 2007 



mesenchymal transition [changes were also noted in the 
expression of other cell adhesion genes of the cadherin and 
ciaudin families, including the striking overexpresskm of the 
placental cadherin gene CDR3, which is associated with early 
events in the. transformation process (25, 26)]; (c) down- 
regulation of ZFHX1B/SIP-1 and Max dEmerization protein 1 
(MXD1/MAD1 ; decreased only 3.3-fofd and therefore not listed 
in Table 2; refs. 27, 28) and overexpression of the RTEL1 
hellcase, which should facilitate tefomere elongation (29); (d) 
alterations that would diminish apoptosis te.g. t overexpression 
of the decoy receptor for Fas ligand, TNFRSF6B, which is 
reportedly coregufated with RTEL1 on chromosome 20q13*3 
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Table 2, Genes Most Likely to be involved in the Development and Evolution of Colorectal Adenomas (A Subset of Genes 
Listed in Supplementary labia S4) Subdivided by Gene Ontology Category 



Gene symbol 



Gene name 



Fold differences" 



Regulation of transcription 

F0XQ1 

MSX2 

ASCLZ 

IRX3 

G8HL3 

WM29 

E7V4 

ARNTU 

TEAD4 

SP5 

HE56 

TBX3 

GfcHLt 

FEV 

SPJfc 

NEUR001 

MEIS1 

NR3C1 

MR5A2 

THRB 

ZNF433 

ZFHX1B 

MEQX2 

HQXD10 

maf 

SOX10 



Cell prolifefa^on/rjifferanliatlon/flpopto^s 



Nuclear localized factor 1 

Forkhesfd box Ql 

Msli homeobox homology 2 

Achaete-scuie complex-like 2 

Kfoh homeobox homologue 1 

Iroquws homeobox protein 3 

GrainybeaiHike 3 

Tripartite rootif-cqnteinmg 29 

Els warrant gene 4 (E1A enhancer binding protein, E1AF) 

Aryl hydrocarbon receptor nuclear tnmslpcator-tike 2 

TEA domain family member 4 

Sp5 transcription factor 

Hairy and enhancer of split 6 

T-bw 3 

Nuclear factor (Brythroid-derived 2Htke 3 

GrainyheatMike 1 

F£V (ETS oncogene family) 

Spi-B transcription factor 

Neurogenic differentiation 1 

Meisl, myeloid ecotropic viral integration site 1 

Nuclear receptor su&Jarotty 3, group C. mcmbef 1 

Nuclear receptor suiifaroity S. group A, member Z 

Thyroid hormone receptor, h 

Zinc finger protein 4B3 

Zinc finger homeobox lb (SlfM) 

Mesenchyme Jiomeobo* 2 

Homeobox Dtp 

vrnaf musculoaponeuroiic fibrosarcoma oncogene 
SRY {?ex deter mining region V>ta»f 10 



REG1B 
REG3A 
TACSTD2 
1L-8 

SERPIfJBS 

REG1A 

FAIMZ 

DUSP4 

REG 4 

PBLDA1 

icm 

RTR1 

TGFBI 

IGFBP2 

TDGF1 

TNFRSFGB 

DMBTl 

TNFRSF1QC 

ANGPTL1 

CDKN26 

GPM5B 

ANK2 

UNC5C 

HFGD 

CPNE8 

mm 

tL6R 

TUSC3 

OUSP1 

REfcG 

NDN 

IGF1 

Ce!P adhesion 

CDH5 

CLDN2 

CLDN1 

DSG3 



Regenerating islet-derived 1ft 
Regenerating tslec-dertverj 3a 
Tumor-associated calcium signal transducer 2 
Interleukln-S ■ 

Se/pin peptidase inhibitor, clade B, member 5 (Maspm) 

Regenerating i$1et-derived la 

Fas apoptotlc inhibitory molecule 2 

Dual specificity phosphatase 4 

Regensrating is?ei-t?erived family, member 4 

Pfeckslrin homology* like domain, family A, member 1 

LipoeaSIn 2 (oncogene 24p3) 

Regulator of telomere elongation helicese 1 

Transforming growth factor, h induced 

insulln*Hke growth factor binding protein 2 

Tefatocarctnoma-derived growth factor 1 

Tumor necrosis factor receptor superFemtly, member 6b, decoy 

Deleted in malignant brain tumors 1 

Tumor necrosis factor receptor supcrfamtly, member tQc decoy 

Angtopoictin-iilte 1 (Angioarrestta) 

CycHn-dependem kinase inhibitor ZB (p15, inhibits CDK4) 

Glycoprotein M6B 

Ankyrin 2 

llnc-5 homoiogue C 

Hydroxypr&stagtencKn dehydrogenase 15- (NAD) 
Copine VMt 

Fas apoptotic inhibitory molecule 3 

htferieukin'8 receptor 

Tumor suppressor candidate 3 

Dual specificity phosphatase "J 

RAS*tike r estrogen-reoAffaied, growth inhibitor 

Necdin 

Insulin-like growth factor 1 (somatomedin C) 

Csdherin 3, type 1, P-cadherin 
Ciaudin 2 
Cteudin t 
Desmoglein 3 



33.1 
. 24.4 
22.2 
77.3 
8.5 
8.4 
7.9 
7.4 
5.4 
5-3 
S2 
5.2 
4.& 
4.6 
4.3 
4.2 



75,8 
29.5 
21.4 
14.7 
U£ 
8.2 
7.5 
7,4 
6.8 
6,0 
5.7 
5,6 
5.2 
4.8 
4.7 
4.5 
4,2 
4.1 



1$,1 
U2 
10,6 
7.1 
5,9 
5.6 
5.2 
• 5.1 
4.8 
4.7 
4.6 
4.5 
4.2 



24.9 
14.9 
11.5 
9,8 
7.4 
6,1 
5.5 

4.8 
4-7 
U 
4.6 
4,5 
4.0 



81.7 
16.1 
9.0 
7.3 



{Continued on the following page) 
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Tabte Z, Genes Most Likely to be Involved in the Development and Evolution of Colorectal Adenomas (A Subset of Genes 
Listed in in Supplementary Table $4) Subdivided by Gene Ontology Category (Cont'd) 



G<me symbol 


Gene name 




Fold differences* 




DSG4 


Dcsmogleln 4 


$3 






CLDN8 


Claudia 8 






25.8 


CDHtS 


Cadherin type 2 






B.3 


CEACAM7 


Carcinoembryonic amigervrfeteicd ceH adhesion molecule 7 






8.3 


CLDN23 


Ctaydto 23 






B.D 


NRXN1 


Neurextn 1 






7.1 


PCDH19 


Pfolocadherin 19 






6.B 


NLGN4X 


Weuroiigm 4, X-linked 






fcD 


TNXB 


Tenascln X6 






5.6 


MUCDHL 


Mucin antf cadherrn-MKe 






5J 


PCDK9 


ProtcKatiherin 9 






4,9 


LlCAM 


LI cell wtefon macule 






4.2 



'Overexposed { E) or undepressed (J > in adenomas (versus normal mucosa samples). 



(30-32); decreased expression of the netrin-1 receptor, UMC5C 
(33); and expression changes involving three Fas apoptosis 
inhibitory molecules (FAIM), including FAIM1, which was 
increased 2. 3-fold and is thus not listed in Table 2]; and (e) 
marked down- regulation of several genes that would result in 
reduced tumor suppression activity [e>g*, those encoding the 
anUanglogenic 1 factor ANGPTL1 £34), the cycl in-dependent 
kinase inhibitor CDKN2B/p15 r and the prostaglandin' catabo- 
lism enzyme HPGD (35)]. 

It is also important to recall the size-related differences noted 
In the adenoma gene expression profiles (Fig. 2; Supplementary 
Table S3), When validated in a larger series of tumors, these 
differences should provide important clues to the molecular 
basis of the well-known link between the dimensions and 
malignant potential of colorectal adenomas (1), 

Our study also furnishes a complete picture of expression 
changes involving gene components of the Wnt pathway across 
the transition from normal to adenomatous epithelium (Supple- 
mentary Table S2)' as well as evidence for the existence of a 
novel Wnt target: KIAA1199. This gene, which encodes a 
protein of unknown function, was strikingly overexpressetf in all 
the adenomas included in this study and In 25 adenocarcinomas 
of the colon described in a previous report {8}. Even more 
tntriguingly, its expression was significantly correlated with 
that of several genes that are well-established targets of Wnt 
signaling. Our hypothesis that KIAA1199 is up-regulated by the 
TCF(s)/h-catenin transcription complex was considerably 
strengthened by the marked decreases in K1AA1199 expression 
observed in cultured colorectal cancer cells when tie Wnt 
pathway was inhibited by overexpresslon of dominant-negative 
TCF4 proteins or by rvcatenin knockdown, It is not yet clear 
whether this is a direct effect, but this possibility is supported by 
the results of a recent genome-wide TCF4 ChlP-on-chip 
analysis, which indicates that the KIAA1199 locus is surrounded 
by four TCF4*bound regions, 10 These findings are consistent 
with the probable role of this gene as a direct target of TCF4/h- 
catenin signaling in the intestine and in colorectal tumors. 



™ Hatzk ea al. r unpublished data. 



Other features of KIAA1199 expression are also compatible 
with its putative role as a Wnt target gene. KIAA1199 mRNA 
and protein are both confined to the proliferative compartment 
of normal intestinal crypts, where Wnt signaling is normally 
active, and they are highly overexpressed in colorectal 
adenomas and carcinomas, where this, pathway i$ almost 
always aberrantly activated. 

In normal and tumor tissues, K1AA1199 is expressed in the 
eytopSasm of epithelial ceils, In glands with low-degree 
dysplasia, higher concentrations are observed in the mucin 
vacuoles of goblet ceils, but cytoplasmic expression of the 
protein In tumor ceils remains elevated even after goblet cell 
differentiation has been lost (Fig, 4). These features, together 
with the localization of KIAA113S in the luminal portion of 
the cytoplasm, are suggestive of a secreted and/or membrane 
protein. This conclusion is consistent with our in silico analysis 
of KIAA1199 (see Supplementary Data and Supplementary 
Fig. S5) r which strongly predicts the presence of a signal 
peptide at its NH2*terminal end. In addition, the centra! region 
of KIAA1199 contains a TMEM2 homology domain, which is 
present in .several eukaryotic ' proteins, including TMEM2, 
poiyductin (PKHD1), and fibrocystic L (PKHD1L1; Fig, 5), 
all large receptor proteins characterized by an NHa-termins! 
signal peptide or a single transmembrane helix and a short 
cytoplasmic tail (36), 

A study based on yeast two-hybrid screens suggested that 
KIAA1199 may interact with plexin A2 (K1AA0483; ret 37). 
The transmembrane plexins interact with transmembrane 
semaphoring on nearby cells, providing "stop" and "go" 
signals that are crucial for ceil motility and invasive growth 
(38, 39}. KIAAH99/plexin A2 interaction could thus play 
important roles in colorectal tumorigenesis not only In the 
invasive stages but also earlier during the formation of 
abnormal glands in benign adenomas. 

A Feeent report linked high levels of KIAA1199 mRNA with 
cell mortality in human fibroblasts and in a renal cell carcinoma 
cell line (40), In that study, however, there was no significant 
increase in KIAA1199 expression during replicative aging of 
mortal ceils, and this finding contrasts with the documented 
behavior of other genes involved in cell aging (41). Furthermore, 
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FIGURE 5, Phylogenetic tree of Ihe proteins containing the VME&52 homology domain found in the canlrat region of KfAA1199, The tree was generated 
with (52) from the multiple sequence allotment shown in Supplementary Fig. S5. It was calculated with ihe minimum evolution algorithm and the JTT 

matrix, Positions with gaps were removed for calculaiion of paifwise distances. Node robustness was assessed using the bootstrap method with 100 
resamplings. {Bootstrap values are shown at Ihe nodes,) Two brands emerged, one comprising KIAA1199 and TMEM2 and Ihe other with polyductin, 
iiorocyslin L, and several other THD-cofflaintng proteins found in the ciliale Tetrahymena therrnophila , which were apparently generated in a series of 
Tetrahymerm -specific gene duplications. The NH^lermina! repeats of polyductin and fibrocyslm L clustered together, as did ihe COOK-termlnal repeats, 
suggesting that Ihe intragenic duplication of ihe TH domain in the ancestor of potyductin and fibrocystin L occurred before the divergence of chordates and 
echinodemrts (more details in Supplementary Data). 
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the authors reported wide variation »n KIAA1199 mRNA 
expression in breast cancer ce!3 lines, and this finding raises 
the possibility that expression of this gene in vivo and in cell 
lines may differ. 

We believe that our microarray data will serve as a 
springboard and reference point for other studies on the 
molecular basis of colorectal transformation along the adeno- 
ma-carcinoma pathway (and subsequently for She study of 
alternative pathways)* Some of Ihe transcriptional changes 
reported 'm this study might one day be used as molecular 
indices of the susceptibility of adenomas to malignant 
transformation, information that would be helpful In planning 
appropriate follow-up of She lesions. As for KIAA1199, its 
invariably high expression in the colorectal tumors we studied 
raises interesting possibilities for the development of a new 
molecular marker for the detection of these neoplasms. For 
example, because K1AA1199 expression In the normal mucosa 
is limited to ceils in the lower portion of the crypts, which are 
not yet programmed to be shed into the intestinal lumen, the 
presence of K1AA1199 peptides in fecal water might prove to 
be a specific marker of adenomatous . lesions. In addition, 
although due consideration must be given to ils probable 
physiologic role(s) in intestinal crypts and possibly in several 
other human tissues (40, 42 t 43), KIAA1199 may be a potential 
target of antibody-based therapies. 

Materia !s and Methods 
Tumor Samples 

Pedunculated colorectal polyps and normal mucosa were 
obtained during colonoscopies carried out in the Gastroenterol- 
ogy Unit of the Belcolle City Hospital (Viterbo, Italy). The 
tissues were collected prospectively with informed patient 
consent and the approval of the local Human Research Ethics 
Committee- Patients with documented familial polyposis, with 
>15 adenomatous polyps (total: synchronous + previously excised; 
ret 44), or currently treated with nonsteroidal anti-inflammatory 
drugs (inc&uding aspirin} were excluded from the study. 

For each polyp, three biopsies of normal mucosa were 
collected from the same colon segment (z2 cm from the site of 
the polyp). Immediately after removal, a small sample of 
epithelial tissue (5-15 mg) was cut from the tip of each polyp, 
leaving me underlying muscularis mucosae intact. We excluded 
polyps <1 cm to ensure that the sampling procedure would not 
interfere with the histologic diagnosis, Alt pofyp samples were 
collected by a single operator (M.dP,) using the same procedure 
to minimize artifacts due to sampling differences. The approach 
used allowed us to obtain specimens with a high percentage of 
epithelial cells withoul resorting to microdissection, which can 
diminish the quantity and quality of the extracted RMA. 

The polyp sample and the three norma* mucosal biopsies 
were Immersed in RNAIater (Ambion) for subsequent micro- 
array analysis, and the remainder of the polyp was submitted for 
pathologic analysis. The cut surface at the tip was labeled with 
India ink so that the sampled area could be easily identified 
during routine histologic examination, The tissue was then 
fixed in buffered formalin and embedded in paraffin. DNA 
extracted from sections of this specimen was also used io rule 
out microsateliite instability (reflecting defective DNA mis- 
match repair) st the BAT26 locus, as previously described (45). 



All of the polyps Included in the study met the following 
criteria; type 0-ip (6), maximum diameter of Ho 4 cm, absence 
of surface ulceration, histologic diagnosis of adenoma, and 
absence of microsateliite instability at BAT26, 

in some analyses, we also included sransaiptomic data from 
a previously described set of 25 colon cancers (mismatch repair 
proficient and deficient; ref. 8), which we reanalyzed for this 
study with the same microarray used to characterize the 
adenomas and normal mucosa. 

Microarray Analysis, Real-time Reverse Transcription- 
PCR, and Northern Blotting 

Total RHA was extracted (RNeasy Mini kit, Qiagen) from 
homogenized tissue samples (5-15 rag), and its integrity was 
verified by capillary gel electrophoresis (Bio Analyzer, Agilent 
Technologies). Complementary RNA (15 Ag/sample), synthe- 
sized and labeled as previously described' (8, 46), was 
hybridized with the Affymetrix U133 Plus 2.0 array, which 
contains in situ synthesized oligonucleotides representing the 
entire human genome (54,675 probes). 

Raw gene expression data generated by GeneChip Operating 
Software "(ATfymetrix) were imported Into the GeneSprlng 
software program (Agilent Technologies) and normalized per 
chip (i.e M to the median of all values on a given array) and per 
gene {i.e. ( to the median expression level of the given gene 
across all samples). Analysis was done using the tog expression 
values with GeneSpring's cross-gene error model turned oa 
Probes were excluded from analysis unless they were listed as 
"present or marginal calls" and/or had expression values Z100 
in z50% (z16 of 32) of the samples in at least one of the tissue 
groups (adenomas and normal mucosa). 

Expression data were subjected to four different unsuper- 
vised analyses: (a) hierarchical clustering using the Pearson 
correlation coefficient as a similarity measure and the average 
linkage algorithm for branch merging; (b) PCA, which reduces 
the dimensionality (number of variables) of a data set while 
retaining most of its variance (8); (c) correlation analysts, which 
involved computation of Pearson correlation coefficients for all 
possible sample pairs and visualization of correlation values as 
tile plots; and (d) CA, another dimension-reducing method {41), 
which was used to identify samples associated with particular 
gene expression levels, In typical CA, a matrix of n gene 
expression levels from p samples Is treated as a two-way 
contingency table (genes by samples or vice versa) with n and p 
specifications for the "factors" gene and sample, respectively. 
Each intensity value thus reflects the abundance of a given 
transcript in a given sample. Like PCA, CA identifies 
independent "factorial components" that account for variance 
within a multidimensional gene data set, but in this case, the 
components are identified and ranked according to the 
correlation between gene and sample scores. A supervised or 
constrained extension of CA (9), CCA, was then used to identify 
possible correlations between gene expression patterns and 
clinical or pathologic variables. CA and CCA, as well as the 
corresponding plots, were computed using R software and the 
ade4 and made4 packages furnished by Bioconductor, 11 
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The Mann-Whitney test was used to select genes differentially 
expressed in normal mucosa and arienomas; Benjamins 
Hochberg multiple testing conrecUon was applied with a fatee 
discovery rate of 0*01. The genes in this set that were 
differentially expressed with fold differences of 22.0 were 
then analyzed with ErmtneJ software (48) to identify any 
biological processes from the Gene Ontology database (49) that 
were overrepresented, 

Pearson correlation was used to identify correlation between 
K1AA1199 expression and the expression of other genes in the 
entire set of tissue samples, Fisher's exact test was used to 
idenUfy possible overrepresemation of known Wnt targets 
among genes whose expression was closely correlated wiiti that 
of KIAA1199 (correlation values z0.8). 

Reverse transcrjptton-PCR and Northern blotting were done 
as previously described (46, 50) to verify the expression level 
of KIAA1199 in tissue samples and in LS174T coton cancer 
cells in which inducible Inhibition of the Wnt pathway had been 
achieved with previously described methods (14-16). 

Insllu Hybridization 

Digoxigenin-labeled KIAA1199 antisense riboprobes were 
synthesized from a PCR product amplified from human colon 
cDNA with KIAA1199-specific primers (sense: 51-cacatcggg- 
gaggagataga-3J; antisense, containing a T7 RNA polymerase- 
binding site: Sl-taatacgactcactatagggttccagacttgaca-Sl)/ This 
product was transcribed in vitro using the DIG RMA labeling 
kit and T7 RUA polymerase (Roche Diagnostics}, in situ 
hybridizations were done on paraffin-embedded sections of 
human colon fixed with 4% buffered formalin as described 
elsewhere (51), 

Immunohistocbemistry 

Our in sil Icq ana lysis of Kl A Al 1 99 (see Supplementary Data) 
indicated that residues 202 to 217 {1HSDRFDTYRSKKESE} 
form a loop between a conserved h^strand and tbe following 
helix of she NH 2 -termma! GG domain. This charged, surface- 
exposed peptide was used lo raise a rabbit polyclonal antibody, 
which was purified by affinity chromatography on Thiopropyl 
Sepharose 6B (Amersham) derivattzed with the antigenic 
peptide. A 1:1,000 dilution of this antibody was used, as 
previously described (45), to evaluate KIAA11S9 expression in 
formalin-fixed, paraffirvembedded sections of adenoma and 
norma! mucosal tissues. 
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